Policy, Planning, and Research
WORKING PAPERS
Education and Employment
Population and Human Resources Department
The World Bank
July 1989
WPS 242
A Multi-Level Model
of School Effectiveness
in a Developing C(ountry
Marlaine E. Lockheed
and
Nicholas T. Longford
Schools in Thailand are more uniformly effective than previous
research in developing countries would suggest. Higher levels
of math achievement are associated with more qualified math
teachers, an enriched curriculum, and frequent use of textbooks.
The Policy, Planning, and Research Complex disuibues PPR Working Papers to disserninate the findings of work in progress and to
encourage the exchange of ideas among Bank staff and all others interested in development issues. These papers carry the names of
the au hors, reflect only their views, and should betused and cited accordingly. Thc findings. interpruations, and conclusions are the
authors own. They should not be attributed to the World Bank, its Board of Directors, its management, or any of its member countries.



Plc,Planning, and Research |
Education and Employment
What makes one school more effective than            * Lemin- was higher for boys, younger stu-
another -  particularly which inputs and man-      dents, ar.d for children who reported higher
agement practices most efficiently enhance         educational aspirations, less parental encourage-
student achievement -  has become the center of    ment, more confidence in their own mathematics
lively debate in the literature. Which method to   ability, greater interest in mathematics, and a
use to compare school effects particularly         feeling that mathematics wa. relevant to them.
concems analysts.
- Schools in Thailand were more uniform in
Lockheed and Longford used a multi-level       their effects on leaming than previous research
model to analyze what improved performance in    in developed countries had suggested would be
grade 8 mathematics in Thailand. They con-         the case.
cluded that:
The model developed by Lockheed and
* Schools in Thailand were equally effective     Longford was able to explain most variance
in teaching students eighth grade mathematics      between schools but significantly less with1.
(for example, in transforming pretest scores into  schools. Only one variable slope was observed:
posttest scores).                                  the relationship between educational aspirations
and achievement.
* Schools and classrooms contributed 32
percent of the variance in posttest scores and         Lockheed and Longford applied multi-level
individual characteristics 68 percent.             techniques to longitudinal data recently collected
by the Intemational Association for the Evalu-
G Greater learning occurred in schools having    ation of Educational Achievement in Thailand.
a higher proportion of teachers qualified to teach
mathematics, classrooms having an enriched         One question they tried to answer was: How do
curriculum and in which textbooks frequently       estimates obtained from the new multi-level
were used.                                         techniques compare with those obtained from
ordinary regression methods?
This paper is a product of the Education and Employment Division, Population and
Human Resources Department. Copies are available free from the World Bank, 1818
H Street NW, Washington DC 20433. Please contact Cynthia Cristobal, room S6-
001, extension 33640 (66 pages with tables).
The PPR Working Paper Series disseminates the findings of work under way in the Bank's Policy, Planning, and Rcsearch
Complex. An objective of the series is to get these findings out quickly, even if prcsentations arc less than fully polished.
The findings. intcrprctations, and conclusions in these papers do not necessarily represent official policy of the Bank.
Produced at the PPR Dissemination Center



Acknowledgements
The contributions of the International Association for the Evaluation of
Educational Achievement (IEA) in making the data available to
us and the extensive comments of Stephen Raudenbush on an earlier draft are
gratefully acknowledged.



CONTENTS
Page
INTRODUCTION    ....                .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2
Methodological Considerations . . . . . . . . . . . . . . . . . . .   5
CHAPTER I: THE DATA . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
Sample          ....       .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     .  7
Method  ....            .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  8
Measures           .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   .  .  .  .  .  .  .  .  .                    8
CHAPTER  II:    MODELS   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .   .  .  .  .  .  .  .  .  .                          15
Variance  Component  Models    ....                       .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  15
Analytical  Framework                   ....       .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    .   .   .  16
Variance Component Models Compared with OLS   . . . . . . . . . . . .   18
CHAPTER III: SCHOOL EFFECTS ON MATHEMATICS LEARNING . . . . . . . . . . .   21
Model  1:  Ordinary  Regression  (OLS)  ...  .                                 .   .   .   .   .   .   .   .   .   .   .   .   .  21
Model  2:  (Simple)  Variance  Component  Model  (VCS)  .   .   .   .   .   .   .   .   .   .                              22
Model  3:  Variable  Slopes  Model  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .              26
Model  4:  Comparison  of  the  Models    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .                27
Summary    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  29
CHAPTER IV: PUPIL BACKGROUND AND SCHOOL/CLASSROOM EFFECTS ON LEARNING . .   30
Overview   .....                .        .   .   .   .   .    .   .   .   .   .   .    .   .   .   .   .   .    .   .   .   .   .   .   .  30
Multiple  Regression  Models                      ....        .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  32
Modelling of Group-Level Variation (Random Slopes and Random
Differences)    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  44
Conditional Expectations of the Random Effects    . . . . . . . . . .   52
CHAPTER  V:  DISCUSSION   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    55
Summary    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  56
Caveats         .   .   .   .   .   .   .   .   .   .   .   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  60
REFERENCES           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  .   .   .   .   .   .   .   .   .   .   .   .   .   .  64



Tables
Table 1: Sample Characteristics and Variable Names, Descriptions and
Means (Proportions) of Student-Level Variables for Three
Data Sets
Table 2: Sample Characteristics and Names, Descriptions and Means
(Proportions) of Group-Level Variables for Three Data Sets
Table 3: Comparison of OLS and VCS 1odels of Grade 8 Mathematics
Posttest Predicted from the Pretest, Thailand, 1981-82
Table 4: OLS and VCS Model Estimates for 2,076 Students and
60 Classrooms/Schools Using All 31 Explanatory Variables,
Thailand, 1981-82
Table 5: OLS and VCS Model Estimates for 2,076 Students and 60
Classrooms/Schools Using 23 Explanatory Variables,
Thailand, 1981-82
Table 6: OLS and VCS Model Estimates for 2,804 Students and 80
Classrooms/Schools Using 23 Explanatory Variables,
Tnailand, 1981-82
Table 7: OLS and VCS Model Estimates for 2,804 Students and 80
Classrooms/Schools Using 17 Explanatory Variables,
Thailand, 1981-82
Table 8: OLS and VCS Model Estimates for 3,025 Students and 86
Classrooms/Schools Using 17 Explanatory Variables,
Thailand, 1981-82
Table 9: Summary of Tables



A MULTI-LEVEL MODEL OF SCHOOL EFFECTIVENESS IN A DEVELOPING COUNTRY
1N1RODUCTION
There are several central questions behind the research into school
effectiveness. First, do schools make a difference in how much a student
learns (that is, does the specific school in which a child is enrolled have a
particular impact on his or her achievement, independent of family
background)? Second, if so, what are the characteristics of the school that
account for this difference? Third, do certain schools affect certain types
of students differently than others?
These questions, first raised by Coleman in the 1960s, have been
reconsidered in the current research on the effectiveness of private schools
(Coleman, Hoffer and Kilgore 1982) and by a new generation of "effective
school" researchers (Aitkin and Longford 1986; Goldstein 1986; Raudenbush and
Bryk 1986; Reynolds 1985; Rutter 1983; Willms 1987). The new researchers have
investigated the questions through the application of new analytic techniques
that take into account the hierarchical nature of most data on education:
children within classrooms, classrooms within schools and schools within
educational authorities (e.g., districts).
Although appropriate methods for analyzing hierarchically
structured data on education have been available since the early 1970s
(Dempster, Laird and Rubin 1977; Lindley and Smith 1972), application of these
methods to educational policy decisions in developing countries has been
hampered by two important shortcomings: (i) the absence of computationally
efficient algorithms for multi-level analysis; and (ii) the lack of adequate
data (sufficient cases at each organizational level). Recent].y, new



2
computational methods have been developed that address the first problem
(Goldstein 1984, 1986; Longford 1987; Bryk, Raudenbush, Seltzer and Congdon,
Jr. 1986), and data sets sufficient for their application have been collected
in a number of developing countries.
This paper applies multi-level techniques to longitudinal data
recently collected by the International Association for the Assessment of
Educational Achievement (IEA) in Thailand to answer the following questions:
(i) do Thai middle schools affect student learning differentially? (ii) what
part of the variation in student learning is attributable to between school
characteristics versus between student characteristics? (iii) what
characteristics of teachers and schools enhance student achievement,
independent of student background? (iv) what is the comparative effectiveness
of alternative school inputs? (v) are the effects of schools uniform across
different students? and (vi) how do estimates obtained from the new, multi-
level techniques compare with those obtained from ordinary regression methods?
Background
The comparative effectiveness of schools in developing countries,
particularly the relative efficiency with which alternative inputs and
management practices enhance student achievement, has become the center of a
lively debate in the literature (see, for example, Fuller 1987; Harbison and
Hanushek 1989; Heyneman 1986; Lockheed and Hanushek 1988). These issues have
important implications for how governments and international development
agencies should allocate their limited resources--whether they should
concentrate on certain types of inputs (capital investment or lowering class
size) or should finance others (instructional materials, teacher or headmaster



3
training or student testing). In the United States and the United Kingdom,
the debate was sparked by studies that claimed to identify effective schools:
those that enhanced student achievement more than other schools working with
similar students and material inputs (see Rauienbush 1987 for a recent
review).
In developing countries, research on school effectiveness ha3 been
more limited, and studies examining the effects of alternative inputs on
student achievement have not taken into account the explicitly hierarchical
nature of the explanatory models and data. Instead, most research on
effective schools in developing countries has utilized a "production function"
approach that compares the relative effectiveness of alternative material and
non-material inputs and, to a lesser degree, teaching processes on student
achievement. The school characteristics most frequently examined have been
indicators of material inputs: per pupil expenditures, number of books,
presence of a library, presence of desks, teacher salaries and so forth./
The past decade has provided several important reviews of this research
~Avalos and Haddad 1981; Fuller 1987; Heyneman and Loxley 1983; Husen, Saha
and Noonan 1978; Schiefelbein and Simmons 1981; Simmons and Alexander, 1978).
Most of the reviews conclude that, when student background is controlled for,
school characteristics do have significant effects on achievement, and, in
many cases, the effects of school characteristics are greater than the effects
of family background.
!J The most extensive research using this type of model is reported in a
recent longitudinal study (Harbison and Hanushek 1989) of the effects of
material inputs on student achievement in rural Brazil.



4
Heyneman and Loxley (1983), for example, found that the variance in
student achievement c.plained by three family background variables averaged
8.6% across 17 developing countries, while the variance explained by school
characteristics amounted to 16%, nearly twice as great. Yet, overall, the
amount of variance in student achievement explained by variables related to
family background and school inputs in developing countries remains remarkably
low in comparison witn the results of similar studies conducted in developed
countries. Heyneman (1986) has argued strongly that the tailure of
conventional models to explain the variance in achievement is a consequence of
poorly conducted research. An equally strong case can be made regarding the
inadequacy of the models and indicators employed.
The more recent research on school effectiveness differs from
earlier approaches in four important ways. First, education production
function research has moved away from answering the questions of whether and
how much specific material and non-material inputs affect student achievement
to exploring other questions, including the effects of alternative inputs on
achievement (e.g., Harbison and Hanushek 1989) and the mechanisms whereby
material and non-material inputs affect achievement (Lockheed, Vail and Fuller
1987). Second, better and more culturally relevant indicators of students'
social background in developing countries have been utilized (e.g., Lockheed,
Fuller and Nyirongo 1987). Third, complex organizational models of student
achievement (e.g., Rosenholtz 1989) have begun to replace education production
function models. Fourth, research has begun to center on the classroom and
classroom processes as important determinants of learning, with specific focus
on the role of teachers and administrators as managers of student learning



5
(e.g., Lockheed and Komenan 1989; Lockheed, Fonacier and Bianchi 1989). This
paper addresses all four issues.
Method2ological Considerations
While matters of substantive cancern continue to d. ive the research
on effective schools, the "effective schools' issue has been fueled by
controversy over statistical methodology, interpretation and data (for
example, Sirotnik and Burstein 1985). The most important statistical issue is
the use of appropriate methods to analyze multi-level data. The argument
concerns how behavior at one level (e.g., classroom, school or district)
influences behavior at a different level (e.g., students) and how to estimate
these multi-level effects correctly.
Hierarchically structured data are common in social research,
because social institutions are typically hierarchically organized. However,
the commonly used statistical techniques for dealing with related data may
lead to biased estimates.y In particular, it has been established that, when
observations within clusters on any stratum are more homogeneous than those
between clusters, the use of ordinary regression methods (e.g., OLS) with such
data can lead to biased estimates of regression coefficients in unbalanced
designs and even to substantially biased standard errors for these estimates
in balanced designs. In that most policy research entails the use of
I/ These hierarchical structures result from design elements
(stratified sampling), data collection technicalities (e.g., interviewer
effect) or intrinsic interest in cross-level effects (e.g., the effects of
post-natal feeding programs on the relationship between birth weight and
subsequent cognitive development).
/ An extended discussion of this i3sue is provided by Goldstein (1987).



6
unbalanced designs, a serious problem may arise when ordinary least squares
regressiou estimates are used to quantify effects.
Proper analysis of multi-level data requires two distinct changes in
thinking about the data. First, the researcher must confront the demands of
the inherently hierarchical da%.a common to education at the stage of sample
design, so that sufficient numbers of units at each level are samnpled (e.g.,
adequate samples of schools and classrooms, in addition to the sample of
students). Second, and more important, hierarchical analysis allows a major
shift in how the effects of organizations on individuals may be viewed:
instead of considering only the effects of organizational characteristics on
organizational means, the effects on relationships are also modelled. For
example, certain school or c ssroom interventions may affect not only average
student achievement, but they may also lessen the degree of association
between family background and student achievement. Here an organization-level
force serves to mediate an individual-level effect.
Until recently, most discussions of multi-level analysis have
remained theoretical, bounded by the costs and computational requirements of
existing analytic tools. However, the recent development of new analytic
tools for analyzing multi-level data has energized the debate (Aitkin and
Longford 1986; Goldstein 1986; Mason, Wong and Entwisle 1984; and Raudenbush
and Bryk 1986). The development of the general EM algorithm (Dempster, Laird
and Rubin 1977) provided a theoretically satisfactory and computationally
manageable approach to estimation of covariance components in hierarchical
linear models.



7
To date, application of these methods in education policy research
has been limited to a relatively few studies of schools in developed
coLntries. To the best of the authors' knowledge, the present study is the
first such application to dat- from developing countries.
CHAPTER I: THE D
Context
The data used in this study come from the IEA Second International
Mathematics Study (SIMS) in Thailand, 1981-82, and aderess eig'.th grade
mathematics achievement. The structure of Thailand's education system
includes six primary school grades, three lower secondary school grades, three
upper secondary school grades and tertiary education. While the first six
years of schooling are compulsory, secondary education is not. At the time
the data were collected, 33% of the 14-year-old age cohort were enrolled in.
grade eight.
Sample
The IEA SIMS sample consisted of 99 mathematics teachers and their
4,030 eighth-grade students. It was derived from a two-stage, stratified
random sample of classrooms. The 13 primary sampling units were the 12
national educational regions of Thailand plus the capital, Bangkok. Within
each region, a random sample of lower secondary schools was selected. At the
second stage, a random sample of one class per school was selected from a list
of all eighth-grade mathematics classes within the school; only students



8
enrolled in school for the entire school year were included. The result was a
1% sample of eighth-grade mathematics classrooms within each region. This
design does not distinguish between the school and classroom levels, so that
only inferences about the aggregate of these effects are possible.
Method
At both the beginning and end of the school year, students were
administered a mathematics test covering five content areas of the curriculum
(arithmetic, algebra, geometry, statistics and measurement). Students also
completed a short background questionnaire at the pretest and a longer one at
the posttest administration. Teachers completed several instruments at
the posttest, including a questionnaire on their background and one on general
classroom processes. They also previded information about teaching
practices and characteristics of their randomly selected "target" class.
A school administrator provided data about the school.
Measures
The measures included indicators of student attitude and
achievement, of student social class background, of material and non-material
inputs at the school and classroom levels, and of classroom organization and
teaching practices. The following sections provide a description of each of
the variables analyzed in this paper (see Lockheed, Vail and Fuller 1987 for
an extended discussion); acronyms for the variables are givon in parentheses.
For easier orientation, the acronyms for pupil-level variables are given in
capital letters and for group-level (region/school/classroom) variables in
underlined lower-case letters. This distinction will be clear from Tables 1



9
and 2, which provide the definitions and summary statistics for all the
variables in the original data set and the data set developed as part of this
paper.
Mathematics achievement. The IEA developed five mathematics tests
for use in/SIMS. One of the tests was a 40-item instrument called the
core test. The remaining 4 tests were 35-item instruments called rotated
forms, designated A through D. The 5 test instruments contained roughly equal
proportions of items from each of the 5 areas of curriculum content, except
that the core test contained no statistics items. For purposes of this
analysis, we regard the instruments as parallel forms with respect to
mathematics content.
The IEA longitudinal design called for students to be administered
both the core test and one rotated form chosen at random at both the pretest
and posttest. In Thailand, students were pretested using the core test and
one rotated form. At the posttest, they again took the core test and one
rotated form that was different from the rotated form taken at the pretest.
Approximately equal numbers of students took each of the rotated forms test in
both test administrations.
One goal of this analysis was to predict posttest achievement as a
function of pretest performance and ozher determinants. Since students took
the core test during the pretest, their posttest scores would reflect, to some
degree, familiarity with the test items. For purposes of our study, instead
of using the core test, we analyze the scores obtained from the rotated forms,
after equating them to adjust for the differences in test length and
difficulty. In this analysis, we use equated rotated form formula scores for



10
both the pretest (XROT) and posttast (YROT) me sures of student achievement in
mathematics .'
Table 1: Sample Characteristics and Variable Names, Descriptions and Means (Proportions)
of Student-Level Variables for Three Data Sets
Means/Pro2ortions
Variable                                                            Data      Data       Data
Name                     Description                              Set 1     Set 2      Set 3
Sample
Students                                                        2,076     2.804       3,025
Classrooms                                                         60        80          86
Student-Level VariableS
XROT      Pretest mathematics achievement score                       9.15      8.83       8.83
XSEX      Student gender (O - female; 1 - male)                        .53        .53       .53
XAGE      Age in months                                             170.94    171.05    171.09
YFOCCI    Father's occupational status:
Unskilled or semi-skilled worker                           .15       .15        .15
Skilled worker                                             .44       .45        .46
Clerical or sales worker                                   .26       .26       .25
Professional or managerial worker                          .15       .15       .14
TMEDUC    Nnther's educational attainment
Very little or no schooling                                .26       .26        .26
Primary school                                             .58       .58        .58
Secondary school                                           .09       .09       .09
College, university or some form of tertiary ed.           .07       .07        .06
YHLANG    Use of language of instruction at home (O - no, 1 - yes)   .49         -          -
YHCALC    Calculator at home (O - no, 1 - yes)                         .31       -          -
YMOREED   Educational expectations
Less than two years                                        .08       .08       .08
Two to four years                                          .30       .31        .30
Five to seven years                                        .41       .41       .41
Eight or more years                                        .22       .20       .21
YPARENC   Parental encouragement (1 - high)                           2.12      2.10       2.09
YPERCEV   Perceived mathematics ability (1 - high)                    4.05      4.05       4.05
YFUTURE   Perceived future importance of mathematics (1 - low)        2.06      2.05       2.06
YDESIRE   Motivation to succeed in mathematics (1 - low)              5.47      5.47       5.47
4/ For more detail on the construction of the achievement measures, see
Lockheed, Vail and Fuller (1986).



11
TakI,I2  Sample Characteristics and Names, Descriptions and Means (Proportions)
of Group-Level Variables for Three Data Sets
Meanu/Progortions
Variable                                                           Data      Data       Data
Name                   Description                               Set 1     Set 2      Set 3
Students                                                       2,076     2,804      3,025
Classrooms                                                        60        80          86
Grou.-level Variables
senrolt   Number of students in school ('000)                      1.27      1.44        1.41
sdxsyr   Days in school year                                     195.04       -           -
sRutear   Pupil/teacher ratio in school                           14.86     L5.81      15.93
sgulumt   % of teachers in school qualified to teach math.          .57       .62          62
seci81    District per capita income (in 1000 bahts)              12.94     12.97         -
s$tre     Ability groupings for instruction
(0 - no; 1 - yes)                                         .46       .47       -
Teacher gender (0 - female, 1 - male)                     .33       .37        -
Teacher age in years                                   29.04        -
texptch   Years of teaching experience                             7.25       -
tedmath   Semesters of post-secondary mathematics                  3.95       -           -
tnstuds   Number of students in target class                      43.61     42.61        -
tmthsub   Math curriculum (0 - remedial or normal, 1 - enriched)  .22         .20         .18
txtbk  Frequency of use of textbook (0 - no; 1 - yes)               .55       .56         .58
cefee     Frequency of individual feedback                         2.15       -           -
taduinl   Minutes spent weekly on routine administration          26.84       -           -
torderl   Minutes spent weekly maintaining class order            19.40     20.27      20.33
tseatl    Minutes students spent weekly at seat or                53.76     54.57         -
blackboard
tvismat   Use of commercial visual materials (0 - no; I - yes)    .34         .40         -
tworkbk   Use of published workbooks (0 - no; 1 - yes)              .85       .83         .81



12
Student background characteristics. The basic background
information about each student included his or her gender (XSEX), age in
months (XAGE), paternal occupational status (YFOCCI), highest maternal
education (YMEDUC), home language (YHLANG) and home use of a four-function
calculator (YHCALC). Paternal occupation (YFOCCI) was classified into four
categories: (i) unskilled or semi-skilled worker, (ii) skilled worker,
(iii) clerical or sales worker, and (iv) professional or managerial worker.
Maternal education (YMEDUC) was classified into four categories: (i) very
little or no schooling, (ii) primary school, (iii) secondary school, and
(iv) college, university or some form of tertiary education.
Student attitudes and Derceptions. Five indices of student
attitudes and perceptions were included. Student educational
expectations (YMOREED) were measured by a single item that asked about the
number of years of full-time education the student expected to complete
after the current academic year. The following categories were defined:
(i) less than two years, (ii) two to four years, (iii) five to seven years,
and (iv) eight or more years. Parental encouragement (YPARENC) was measured by
a four-item index composed of responses on a Likert-type scale in which
students described their parents' interest in, and encouragement for,
mathematics achievement. For example, for the item "My parents encourage
me to learn as much mathematics as possible," the response alternatives ranged
from "exactly like" the student's parents (- 1) to "Not at all like" the
student's parents (- 5). The four items comprised a single factor, with
principal component factor loadings ranging from .72 to .83 and communality
of 2.43. A low score represented greater parental support. Perceived
mathematics ability (YPERCEV), perceived usefulness of mathematics



13
(YFUTURE) and motivation toward mathematics achievement (YDESIRE) were all
developed from a factor analysis of the student attitude survey, which
contained Likert-type items having response alternatives ranging from
"strongly disagree" (- 1) to "strongly agree" (- 5). The factors were
initially identified through varimax factor analyses and then confirmed
through principal component analyses, from which the factor scores were
constructed. For YPERCEV, a low value represented a positive attitude; for
YFUTURE and YDE'IRE, a high value represented a positive attitude.
School characteristics. This study looks at data on six school
characteristics.   Five are cor-;entional indicators of material and non-
material inputs: (i, school size in terms of the total number of students
enrolled (senrolt), an indicator of potential resources; (ii) length of the
school year in days (sdaysyr), an indicator of the time available for
instruction; (iii) student/teacher ratio in the school (sRutear), an indicator
of the availability of teacher resources for the student; (iv) peLcentage of
the teaching staff qualified to teach mathematics (sgualmt), an indicator of
the quality of teacher resources; and (v) pet capita income in 1981 at the
district level (sRci8l), another indicator of resources. One measure of
school organization is included: (vi) presence of ability grouping (sstream).
Teacher characteristics. Four teacher characteristics are analyzed:
(i) gender (tsex); (ii) age (tage); (iii) teaching experience (texptch); and
(iv) number of semesters of post-secondary mathematics education (tedmath).
The latter two variables are conventional indicators of teacher quality.
Classroom characteristics. Three characteristics of the classroom
are analyzed: (i) class size (tnstuds), an indicator of the teacher resources
available to the student in his/her mathematics class; (ii) remedial or



14
typical versus enriched mathematics subject matter (tmthsub), an indicator of
the quality of the curriculum for the student in a particular class; and
(iii) whether or not the teacher used textbooks frequently in the class
(txtbk), an indicator of the availability of instructional materials in the
classroom.
Teachins practices. Six variables referring to teaching practices
are considered: (i) providing feedback to students (cefeed), a composite index
of five elements of teaching practice: commenting on student work, reviewing
tests, correcting false statements, praising correct statements and giving
individual feedback; (ii) number of minutes per week the teacher spent on
routine administration (tadminl); (iii) maintaining class order (torderl);
(iv) monitoring assigned seatwork (tseatl); (v) using commercially produced
visual materials (tvismat); and (vi) using workbooks (tworkbk).   All
information on variables related to teaching practices were self-reported.
In summary, the data set contains information on 32 variables about
4,030 pupils from 99 schools. Of the 32 variables, 13 involve student
characteristics, 5 refer to the school, 4 to the teacher, 9 relate to the
classroom, and 1 is a characteristic of the district (catchment area). The
distinction between the variables related to pupils and to
classrooms/teachers/schools (henceforth called groups, since they are
confounded in the design) is important because they play different roles in
explaining variations in achievement.fiu
It should be noted that the complete data set consists of 13*4,030 + 19*99
- 54,271 units of data, although conventionally it would be conceived, and
stored on a computer, as a data set of 32*4,030 - 128,960 units of data.



15
The data contain relatively more information about the groups (19
variables for 99 units) than about the pupils (13 variables for 4,030 units).
Arguably, the group-level variables are also more reliable because they refer
to school or teacher records and are responses from adult professionals,
whereas the responses of pupils are subject to test-performance variation,
recall of family circumstances and arrangements, varying interpretations of
the questionnaire items and so on. Moreover, the pupil-level variables, e.g.,
XROT, have a large-group level component of variation; groups vary a great
deal in their composition (means, standard deviations, etc.) of these
variables. Hence, not only the 19 group-level variables, but also, to some
extent, the 13 pupil-level variables potentially explain group-level variation
among the 99 groups, whereas only the 13 pupil-level variables explain some of
the pupil-level variation in the outcome scores of the 4,030 pupils.
CHAPTER II: MODELS
Variance-Comvonent Models
The hierarchical structure of the data, with pupils nested within
groups, requires a form of regression analysis that takes into account the two
separate sources of variation in achievement. Separation of the variation
attributable to pupils and to schools/classrooms is also of substantive
interest, because the latter is a measure of the size of unexplained
differences among schools/classrooms.
Goldstein (1986), Raudenbush and Bryk (1986) and Aitkin and Longford
(1986) have established the relevance of variance component methods for



16
analyzing data with hierarchies. They address the previously mentioned
problems with the use of ordinary regression methods when the assumption of
independence of the observations is not satisfied.
Analytical Framework
Educational surveys involve hierarchically structured data--pupils
within classrooms within schools within administrative units or regions.
Every classroom (school, region) 'Aas its own idiosyncratic features that
result from a complex of influences, including composition, teaching practices
and management decisions. As a consequence, observations on students (e.g.,
their outcomes) are not statistically independent, not even after taking into
account the available explanatory variables. This condition violates the
assumption of independence for ordinary regression (OLS).
By comparison, variance component models are an extension of
ordinary regression models that allow more flexible modelling of variation:
within school or classroom and between schools or classrooms. Pupils are
associated with (unexplained) variation, but this variation has a consistent
within-classroom component that itself has a within-school component, etc.
Schools vary, classrooms within schools vary and pupils within classrooms
vary. Consider the regression model for data with two levels of hierarchy
(pupils i within classrooms j):
Yij  ca + xjj + .zij +  Eij                                   (1)
where a, P and - are (unknown) regression parameters, x and z are explanatory
variables, y is the outcome measure and the random term e is assumed to be a



17
random sample from a normal distribution with a mean of zero and an unknown
variance o2. Variation among the classrooms can be accommodated in the
"simple" variance component model:
yij - a +  x +  zij + aj + fij                                    (2)
where the a's form a random sample from a normal distribution with a mean of
zero and an unknown variance r2 , and the a's and the e's are mutually
independent. The covariance of two pupils within a classroom is r2
(correlation r2/[r2 + o2]). If we knew the a's, we could use them to rank
the classrooms. Model (2) has the form of analysis of variance (ANOVA) with
distributional assumptions imposed on the a's. The advantages of this
assumptioi, are discussed by Dempster, Rubin and Tautakawa (1981), who use the
term "borrowing strength" in estimating the effects of small groups, and by
Aitkin and Longford (1986).
In this model, each school has a uniform offect on the pupils within
it. As this assumption may be unrealistic, a more flexible model is needed
that allows not only the school means but also the school regression
coefficients to vary, as some schools may be more "suitable" for pupils with
certain backgrounds than others. This corresponds to variation in the
within-school regressions of y on x and z. This situation can be suitably
modelled as
Yij   a + PXij + 7zij + aj + bjxij + cjzij + 4ij            (3)
or
Yij   a a + Pxij + yzij + aj + bjxij + eiJ-                (4)



18
The classroom-level random effects (aj, bj) are assumed to be a
random sample from a normal distribution with a mean of zero and an unknown
variance E(2). Here E(2) involves three parameters: the variances of a and b
and their covariance. Extensions to larger numbers of explanatory variables
and to more complex hierarchies are described in the literature (e.g.,
Goldstein 1987; Longford 1987; Raudenbush and Bryk 1986).
The maximum likelihood estimation procedures for such models used
in this paper are based on the Fisher scoring algorithm (Longford 1987)
implemented in the software VARCL (Longford 1986). It provides estimates of
regression parameters and (co-) variances, together with standard errors for
them, and the value of the log-likelihood.
Variance CouDonent Models Compared with OL0
Variance component methods involve the explicit modelling of student
and group variation and afford flexibility in modelling the group variation,
something that ordinary regression cannot do. The specification of a variance
component model is necessarily more complex than is the case with ordinary
regression. In standard situations, the analyst first declares the list of
the regression variables involved in explaining the outcome for a typical
group. Next the analyst declares a sublist of this list that contains the
variables for which the within-group relationships are hypothesized to vary
from group to group. The full list of variables, referred to as the "fixed
part," is analogous to the list of the explanatory variables in ordinary
regression. The sublist (random part) may contain only pupil-level variables,
that is, variables that take on different values for students attending the



19
same class. Variables measured at the classroom level whose values are
constant for all students in a classroom cannot be specified in the random
part of the model, because within-group regression coefficients on group-level
variables cannot be identified.
Variance component models involve two kinds of parameters. The
fixed effects parameters refer to the regression relationship for the average
group. Their interpretation is analogous to the regression parameters in
ordinary regression. The random effects parameters are variances and
covariances that describe the between-group variation in the regression
relationship. Of prime interest are the sizes of the variances. Zero
variance of a regression coefficient corresponds to a constant relationship
across the groups. To obtain information about the variation, we require, in
general, a substantially larger number of pupils and groups than we do for the
regression parameters. We can therefore expect to find that a small random
part, containing only a few variables, provides a sufficient description of
the variation, whereas the fixed part may contain most of the available
explanatory variables.
One important aspect of the separation of the two sources of
variation is the ability to distinguish between pupil- and ;r-,up-level
variation. This aspect comes out very clearly in the follow.ng examples: it
turns out that we have abundant group-level information, i.e., a good
description of the between-group variation, but a much larger proportion of
the student-level variat_on remains unexplained.
To fix ideas, we consider first a specific model:
Yij - Y-kxij,k Pk + dj + Eij                                (5)



20
where the indices i - 1, ..., nj, j - 1, ..., N2 and k - 0, 1, ..., Kg
represent the pupils, groups and variables, respectively. The O's are the
regression parameters, and the d's and e's are the group- and pupil-level
random effects, assumed to be independent random samples from
the normal distribution with zero means and variances a2 and r2 We will
assume throughout that o is the intercept, i.e., xij.0 - 1. Analogously with
the ordinary regression, we can define the R2 as the proportion of variation,
explained as
R2 _ 1 - (o2 + r2)/(o2raw + r2raw)                           (6)
where the subscript "raw" refers to the variance estimates in the "empty"
variance component model:
gi  - u + d  + 'ij.                                          (7)
It is advantageous, however, to define two separate R2s that refer
to the two levels of the hierarchy for pupils and groups, respectively:
Rp2          2 (1 - 02)/02raw                                (8)
Rg2 _ (1 _ r2)/ 2raw,                                        (9)



21
CHAPTER JIII: SCHOOL EFFECTS QS HATEKATICS LEARNING
Two questions that educators frequently ask are how much student
achievement increases over the course of a year and whether schools affect
growth in achievement differentially. In this section, we use the pretest
(XROT) and student posttest (YROT) to address these questions. We also
demonstrate, using simple examples from the data, the differences between
ordinary regression, simple variance component analysis and variance component
"azalysis using random coefficients. In the next section on the results of our
analysis, we apply these techniques to the complete data set, using more
complex models.
Model 1: OrdinarI Regression OLTS)
In the present analysis, for a data set obtained by listwise
deletion with respect to a set of variables considered below (a procedure that
leaves 3,136 pupils in 88 schools), we have for the simple ordinary regression
of posttest (YROT) on pretest (XROT), as per equation (1) with a single
explanatory variable,
Yij  - a + 6xij + eij                               (10)
and
YROT - 4.892 + .818 XROT.                           (11)
(.015)



22
In this model, identification of pupils within schools is completely
ignored; instead, the pupils are assumed to be a randomly drawn sample from
the population of all pupils in the given grade in the country. A pupil with
a given pretest score XROT is expected to score 4.892 + .818XROT on the
posttest. The standard errors for the regression estimates will be given
throughout the paper in parentheses in the line below the regression
parameters. For example, .015 above is the standard error for the regression
coefficient on XROT, .818. The corresponding t-ratio is .818/.015 - 54.5.
The computation of R2 follows:
X raw- 82.80
raw
a2    - 42.56,
so that R2   1  _ 02/a2raw - 1 - (42.56/82.80) - .486.
Model 2: (Simple) Variance Conmonent Model (VCS)
To take into account the group-level variables, we choose a simple
variance component model ("simple" in that it does not contain variable
slopes):
yij     + dj + eij                               (12)
a2 a- 55.56
raw
r2ra-  25.65.
raw
The variation in posttest scores has a substantial group-level
component. That is, the "total" variance is 81.21 (55.56 + 25.65), of which
.316 (25.65/81.21), the variance component ratio, is attributed to group-level
effects. The variance component regression model is given as:



23
YROT - 5.841 + .699 XROT                       (13)
(.018)
o2 _ 38.55
r2    4.78,
so that we have R2 _ 1 - (43.33/81.21) - .466, and
= _ 1 - 38.55/55.56 - .306
R 2 _ 1 - 4.78/25.65 - .814.
g
Thus, if we make allowances for the within-school correlation of the
posttest scores, we obtain a prediction formula for the posttest score
(YROT - 5.841 + .699XR0T) that is substantially different from the OLS
regression described in equation 11. Note, also, by how much the school-level
variation has been reduced.
Table 3 presents the comparison between the simple OLS and simple
variance component models. Clearly, the latter extension of the R2 for
variance components is more informative. The pretest score XROT is a powerful
predictor of the posttest score YROT. However, whereas it explains more than
80% of the variation among the groups, it explains only 30% of the pupil-level
variation. The school-level variation in the outcome scores reflects the
pretest score to a great extent. Some of the remaining within-group variation
may be explained by the other explanatory variables, but they are not likely
to have as dominant an effect as the pretest score does.



24
The variation associated with the testing and scoring procedure,
which could be demonstrated in an experiment with repeated administration of
the test, use of alternate forms, etc., will remain as a component of the
pupil-level variation. Thus, whereas the group-level variation can
potentially be reduced to 0, the pupil-level variation has a component that
cannot be explained by any explanatory variables. In ideal circumstances (and
in our case, almost), we can explain completely why/how schools vary; the
variance of schools in the later models is very small. We carnot, however,
explain the pupil-level variation completely; there will always be an
unexplainable within=pupil variation because of fluctuations in performance,
distractions, guessing and so on. Since every pupil provides only one outcome
score, the within-pupil and within-group variation cannot be separated.
The raw variance component ratio is .316, but with the model with
the pretest score, the ratio drops to .110. If the pretest score is ignored,
the groups appear to have substantial differences. At the same time, the
schools appear to bi much more similar (homogeneous) once we take account of
the pretest scores, i.e., they are much more similar in the way they "convert"
initial ability into outcome.



25
Table 3: Comparison of OLS and VCS Models
of Grade 8 Mathematics Posttest Predicted from the Pretest,
Thailand, 1981-82
_ Method
Models                               OLS              VCS
Empty model
a2raw                          82.80           55.56
raw
2
7 raw                           -            25.65
Regression model
Intercept                            4.892           5.841
Coefficient                          0.818           0.699
St. error coeff.                     0.015           0.018
a2                             42.56           38.55
7T                              -              4.78
nR2                            0.486            -
RD2                             _              0.306
R g2                                           0.814
If a group-level explanatory variable were added to the regression
model, it would result in a reduction of only the group-level variance, which
has already been substantially reduced. Therefore there is less scope for
important group-level explanatory variables than for pupil-level ones. Among
the pupil-level variables there might be ones that explain a great deal of the
remaining pupil-level variation.
Inclusion of a pupil-level variable in the regression model will
cause a reduction in both the pupil- and group-level variances. The relative



26
sizes of the reductions of the two variances will depend on how the variation
in the explanatory variable decomposes into between- and within-group
variance. Hence, potentially the most important pupil-level explanatory
variables are those with little between-group variation.
Model 3: Variable Slopes Model
The variance component model discussed above can be further
generalized into a model that allows variable slopes on the pretest:
i  ~ -    + P 1 Xj xi + doj + dlj(xij - x) + alj,               (14)
where (doj, d1j) form a random sample from a normal distribution with a mean
of zero and an unknown variance, Ed; x is the sample mean for x; and e's are a
random sample from a normal distribution with a mean of zero and an unknown
variance, a2. The maximum likelihood estimates for this model are:
-0 - 5.832
fi  -   .687 (.019)
'2 - 38.367
Ed - Var (do,dl) - 4.947
.0805    .00416
The software VARCL used for maximum likelihood estimation in variance
component models estimates the square roots of the variances in
Ed and produces standard errors for these estimates:



27
Zd,ll - 2.224 (.202)
Zd,22 -   .0645 (.0338)
Zd,12 -   .0805 (.0311).
Model 4: GoMDarison of the Models
Now we test Model 3 against Models 2 and 1. First, we compare Model
3 and Model 2.  The value of the deviance (-2 log-likelihood)Wk is 20,496.3.
Using the conventional t-ratio, we conclude that the slope-variance Ed,22 is
not significantly different from 0, so that we can adopt the simple variance
component model.
More formally, we can use the likelihood ratio test to compare the
two variance component models. The deviance for the simple Model 2 is
20,499.9, 3.6 times higher than in the case of the variable slopes Model 3.
To determine the significance of this difference, it is necessary to determine
the number of degrees of freedom from the "free" parameters. The simpler
model is obtained from the latter model by constraining to zero the slope
variance %d,22 and the slope-by-intercept covariance Ed,12; these are the two
additional free parameters that set the degrees of freedom equal to 2. Hence
the statistic x2 has 2 degrees of freedom, and we can declare that we have
found insufficient evidence for a variable slope of the posttest on the
V This statistic is used to assess how well the model represents the data.
For two models where one is a special case of the other, the differences of
their deviances has a chi-square distribution, with the number of degrees of
freedom equal to the difference in the number of free parameters in the two
models.



28
pretest among the schools. That is, the schools are fairly uniform in their
conversion of pretest scores into posttest scores.
Next we compare the simple variance component model (Model 2) with
the ordinary regression model (Model 1). The differences among the schools,
described by the variance r2 in the simple variance component model, are
substantial and statistically significant; the formal likelihood ratio test
for the hypothesis that r2 > 0 is obtained by comparing the deviances of the
ordinary regression and the simple variance component models. The ordinary
regression deviance (-2 log-likelihocd, which is not the same as the residual
sum of squares) is equal to 20,662.6, 162.6 higher than the deviance for the
simple variance component model (X2 with 1 degree of freedom). Therefore we
reject the ordinary regression model in favor of the variance component model.
Further, the t-ratio for r is large.
Making inferences about relationships that vary from group to group
is of substantive importance in studies of school effectiveness. Schools are
expected to vary in their performance after accounting for differences in the
initial ability of th3 pupils, but other more complex patterns of
between-school variation may arise: schools may be relatively more successful
in teaching children with certain background characteristics, and they may
either exaggerate or reduce the differences among the pupils at enrollment.
The relationships among variables are intimately connected with
variance heterogeneity. By way of illustration, we consider the variable
slope model discussed above. The fitted variance of an observation is
38.367 + 4.947 + 2*(XROT - 8.912)*.08054                          (15)
+ (XROT - 8.912)2 *.00416.



29
It is a quadratic function of the pretest. The minimal variance occurs for
XROT* - 8.912 - .0805/.0042 - -10.45 and is equal to 41.75. Only two pupils
in the whole sample have scores lower than XROT*. Larger values of the
explanatory variable XROT are associated with larger variance. For XROT - 9
(near the mean), the fitted variance is 43.33, and for XROT - 30 (near the
sample maximum), the fitted variance is 48.56. It would appear that for
low-ability pupils, the choice of school is slightly less important than for
high-ability pupils.  We have to bear in mind, however, that we are dealinig
with an observational study, not with an experiment, and in reality pupils, or
their parents, do not have complete freedom of choice over the school. Thus a
causal statement, or a prediction about a future manipulative procsdure, can
be made only under the conditi .n that all the other circumstances in the
educational system remain intact. This assumption is usually very
unrealistic.
Summar
The comparison of the regression relationship (fixed effects) is
instructive. We have
(i)   Ordinary regression
YROT - 4.892 + .818*XROT
(.015)
(ii) Simple variance component model
YROT - 5.841 + .699*XROT
(.017)



30
(iii) Variable slopes
YROT - 5.832 + .687*XROT.
(.019)
The estimate of the regression coefficient on XROT in ordinary regression is
substantially different from the estimates in the two variance component
models. Ignoring the hierarchical structure of the data would lead to
different conclusions, say, in predicting the posttest (YROT) from the pretest
(XROT). In other words, whereas the OLS estimate could be interpreted to
mean that each point on the pretest is worth .82 poirsts on the posttest, the
VCS estimate more accurately places this value at .69 points.
CHAPTER IV: PUPIL BACKGROUND AND SCHOOL/CLASSROOM EFFECTS ON LEARNING
Overvie
In this section we use the complete data set to estimate the
effects of student background and school/classroom variables on achievement in
mathematics. The approach taken is often referred to as a "value-added"
approach, since the purpose is to explain posttest achievement after the
effects of prior learning (pretest achievement) have been taken into account.
Our intent is to obtain the most parsimonious simple variance component model
of grade eight mathematics learning in Thailand, given the data.
Because of missing data, we build the model conservatively, as
follows. First, we start with the data set obtained by listwise deletion with
respect to all 32 variables (including the outcome YROT and the pretest XROT),



31
fit a regression model to this data set, and apply a conservative criterion
(to be specified below) to exclude variables from the obtained regression
formula, so that we end up constructing a restricted set of explanatory
variables. We apply listwise deletion to this restricted set of variables, a
process that leads to a larger sample of pupils and schools. For this new
data set, we again fit the regression model, simplify the regression formula,
if possible, and continue on until no further reduction of the set of
variables and extension of the data set obtained by listwise deletion are
possible.
Usually it cannot be assumed that *che unavailable data are missing
at random, i.e., the distribution of a variable among the pupils from whom we
obtain valid responses is similar to the distribution among the pupils whose
responses are not available (missing). In educational surveys, typically
higher ability pupils, those with higher social status, etc., tend to have
higher response rates, the implication being bias in the estimates of certain
population means, as well as in the regression coefficients obtained from
simple regression. Missingness at random is an unnecessarily stringent
criterion for ensuring that the omission of the subjects with missing data has
no effect on the results of a regression analysis. It is sufficient to have
condit:ional randomness, given the explanatory variables. It means that for
any combination of explanatory variables, the distribution of the outcome
among the pupils in the sample is identical to that for those excluded from
the sample by the listwise deletion procedure. Intuitively, such an
assumption becomes less stringent the more explanatory (conditioning)
variables are used. On the other hand, a larger set of explanatory variables



32
implies a larger proportion of subjects whose data are not used in the
analysis.
An indication of the extent to which the criterion of conditional
randomness is relevant can be deduced from comparisons of model fits for two
different samples: the maximal sample obtained by listwise deletion with
respect to the set of explanatory variables used in the considered model, and
the sample obtained by listwise deletion with respect to a more extensive, or
complete, set of explanatory variables. In a few such comparisons, reported
below, we find close agreement in several pairs of such analyses.
Multi4le Reftression Models
The response rate for the 13 pupil-level variables is between
93-100%. There is no obvious pattern of missingness among the pupils;
complete pupil-level records are available for 3,466 individuals (86%). The
group-level data are available for between 78-99 schools, but only 60 schools
have complete records, and within these schools, only 2,076 pupils also have
complete pupil-level data (51.5%). We begin by fitting the simple variance
component models (VCS), i.e., models involving no variable slopes, to the data
set.
First model: Regression with all variables. Listwise deletion with
respect to all 32 available variables results in a data set containing 2,076
pupils in 60 schools. The ordinary regression fit (OLS) of the posttest on
the pretest is
YROT  - 4.882 + .817*XROT,      a2 - 42.20,
(.017)



33
which is in close agreement with the OLS fit reported above for the larger
data set (3,136 pupils in 88 schools). The corresponding simple variance
component model fit is:
YROT - 5.670 + .720*XROT
(.020)
02   38.79
r2_ 4.02.
Compared to the larger data set, equation 13, we find some
discrepancies: the fitted regression slope for the smaller data set is higher
(.720 versus .699) and the group-level variance is smaller (4.02 versus 4.78).
The variation of the slope on XROT is not significant in either sample, but it
is two-and-a-half times as great in the larger data sez (.00416) than in the
smaller one (.00166). It appears that the 28 schools added to the data are
more likely to have lower regression slopes and contain proportionately more
schools at the extremes (very "good" or very "bad"), because the larger sample
has larger group-level variance, r2. We emphasize that all these differences
may arise purely by chance, rather than as a result of non-random missingness
of the data, but they can have a substantial effect on the inferences drawn.
The OLS and VCS model estimates for the 2,076/60 data using all the
explanatory variables are given in Table 4. The dominant explanatory power of
the pretest score XROT is obvious, as evidenced not only by the t-ratio for
its regression coefficient (32.38 for OLS and 30.80 for VCS), but also by the
comparison of the variance component estimates across models. The raw
variance component estimates are:
a2raW - 57.30
raw
T2 - 28.83.
raw



34
Table 4: OLS and VCS Model Estimates for 2,076 Students and
60 Classrooms/School- Using All 31 Explanatory Variables,
Thailand, 1981-82
OLS                              VCS
Variable           Estimate      St. Error          Estimate       St. Error
Studet Lvl
GRAN') MEAN          18.603            -               19.717           -
XROT                    .680          .021               .647           .021
XAGE                  -.080           .016              -.077           .016
XSEX                    .732          .301               .969           .319
YFOCCI                  .174          .431               .033           .434
-.631           .462             -.646           .460
-.178           .541             -.239           .542
YMEDUC                  .021          .327              -.039           .325
-.129           .562             -.157           .556
-.686           .661             -.899           .663
HCALC                 -.120           .310              -.217           .309
YHLANG                  .203          .315               .012           .341
YMOREED               1.087           .546              1.074           .541
1.570          .545              1.537           .541
1.638          .593              1.610           .589
YPARENC                 .225          .137               .249           .136
YPERCEV               -.980           .160            -1.020            .161
YFUTURE                 .574          .168               .526           .167
YDESIRE                 .277          .236               .228           .233
Group Level
spci8l                 .061           .042               .073           .060
senrolt                .422           .263               .417           .386
sstream               -.426           .358             -.500            .512
sdavsvr               -.006           .020              -.010           .029
sputear               -.152           .051             -.170            .075
squalmt               1.023           .342              1.029           .494
tedmath               -.035           .037             -.044            .053
tsex                  -.580           .336             -.619            .481
tage                   .009           .032             -.001            .046
texptch                .014           .043               .038           .064
tnstuds                .035           .018               .039           .025
tmthsub               1.725           .432              1.941           .628
txtbook               1.602           .338              1.650           .490
(continued)



35
QLS                            vcs
Variable                  Estimate     St. Error         Estimate     St. Error
cefeed                .148         .203              .209         .290
tworkbk            -1.104          .218           -1.124          .314
�vismat               .380         .331              .461         .480
Sadmini          -.003          .004            -.003          .006
tgrderl             -.037          .012            -.039          .016
tseatl                .011         .005              .011         .007
Variance            38.031        6.167              -             -
Pupil-level variance  -                            36.809          -
Pupil-level sigma     -                             6.067          -
Group-level variance  -                             1.317          -
Group-level sigma     -                             1.148        0.192
Deviance              -                        13424.947           -
The pretest score XROT on its own leads to a reduction of these variances to
38.79 (Rp2 - 32%) a."d 4.02 (R 2 - 86%).  However, the other 30 variables
reduce the  1pill-level variance only marginally to 36.8 (Rn2 - 36%).  The
group-level variance is almost saturated-1.32 (Rg2 - 95.5%). It appears that
we have abundant information about the groups, but we are less successful with
an explanation, or suitable description, of the pupil-level variation.
The relatively large number of group-level variables raises a
concern about multicollinearity, i.e., competing alternative descriptions of
the data. To deal with this problem we apply a conservative criterion for the
exclusion of explanatory variables from our models. We regard a variable as
not "important" for the fixed part of the VCS model if the t-ratio of its
regression coefficient is smaller than 0.9 at the first stage of model
reduction and 1.0 thereafter. In the first round of simplifying the model, we
use the 0.9 criterion to exclude two pupil-level social class variables
(calculator in the home [YHCALC] and use of the language of instruction in the



36
home [YHLANG]) and six group-level variables: four indicators of resource
inputs (number of days in the school year [sdavsyrl, teacher's postsecondary
mathematics education [tedmathl, teacher's age [tage], and teaching experience
(texotchl) and two teaching process variables (frequent use of individual
feedback [cefeed] and time spent in routine administration [tadminl]) from the
full list of 31 variables.
Second model. Next we estimate both the OLS and VCS models using
this shorter list of 23 variables. The results are shown in Table 5.
Exclusion of the eight variables (eight degrees of freedom) has virtually no
effect on the retained regrassion parameters and their standard errors
(compare Tables 4 and 5); the exception is an indicator of instructional
materials (use of commercial visual materia's [tvismat]), which now fails to
meet the inclusion criterion. The increase in the variance components is only
marginal, in particular for the group-level variance. The difference in
deviances is 3.3 (X2 ).
Again we obtain the largest data set obtainable by listwise deletion
with respect to the retained variables; this procedure yields data for 2,804
pupils in 80 schools. We then compute the variance component analysis for
this data set; the results are given in Table 6. We see that the regression
coefficients for the pupil-level variables are stable across the data sets (as
compared with Tables 4 and 5), but the discrepancies for the group-level
variables are substantial. There are two separate, but possibly
complementary, explanations for these discrepancies: multicollinearity and
non-random missingness of data. Multicollinearity would cause the regression
estimates to be sensitive to changes in the data, in our case to the inclusion



37
Table 5: OLS and VCS Model Estimates for 2,076 Students and
60 Classrooms/Schools Using 23 Explanatory Variables,
Thailand, 1981-82
OLS                             VCS
V&riable          Estimate       St. Error         Estimate       St. Error
Student Level
GRAND MEAN           18.118           -              18.370
XROT                   .685          .020               .650          .021
XAGE                 -.080           .016             -.076           .016
XSEX                   .723          .299               .958          .318
YFOCCI                 .118          .426               .033          .432
-.621          .457             -.651           .457
-.139          .538             -.212           .541
YMEDUC                 .037          .326             -.028           .325
-.068          .559             -.115           .555
-.604          .656             -.855           .660
YMOREED               1.115          .545             1.083           .540
1.568          .543             1.521           .540
1.666          .591             1.609           .589
YPARENC                .238          .137               .255          .135
YPERCEV               -.970          .160            -1.010           .161
YFUTURE                .570          .168               .526          .167
YDESIRE                .287          .235               .234          .233
GrouR Level
s5ci81                 .050          .038               .058          .056
senrolt                .509          .251               .540          .373
sstre am             -.441           .324             -.503           .472
sRutear              -.178           .046             -.198           .068
sgualmt               1.062          .327             1.090           .480
tsex                 -.518           .314             -.536           .460
tnstuds                .036          .017               .038          .025
tmthsub               1.802          .409             2.094           .604
txtbk                1.649           .315             1.673           .463
tworkbk             -1.028           .204            -1.039           .300
tvismat                .368          .322               .393          .473
torderl              -.040           .010             -.043           .014
tseatl                 .010          .005               .011          .007
Variance             38.108         6.173             -              -
Pupil-level variance  -                              36.855         -
Pupil-level sigma       -                             6.071         -
Group-level variance  -                               1.351          -
Group-level sigma       -                             1.162           .191
Deviance                -                         13428.295          -



38
of over 700 new observations. As an alternative, the discrepancies could
arise as a result of the non-random missingness in our data, i.e., if the two
data sets have genuinely different regression characteristics. A suitable
indication, although not a fool-proof check, for the latter possibility is
obtained by fitting the models with identical specifications for the different
"working" data sets. We have fitted the reduced second model (Table 5) to the
larger data set (Table 6), and although we obtained different values for the
group-level regression coefficients, it turns out that the reduced list of
variables also provides an adequate description for the data (as judged by the
likelihood ratio criterion). The pupil-level regression coefficients differ
only marginally.
We conclude, therefore, that multicollinearity is the more likely
cause of the discrepancies in the estimates: we have too many group-level
variables, so that the parameter estimates are subject to large fluctuations
when small changes are made in the data. The explanatory variables provide
sufficient conditioning for the outcome data to be missing at random, given
the available explanatory variables.
In keeping with According to our exclusion criterion (t ratio < 1),
we now delete from the fixed part of the model six group-level variables.
Four are conventional material and non-material input variables (district
level per capita income rsDci81], teacher gender [tsex], class s'ze [tnstuds],
and use of commercial visual materials [tvismat]) and two are organization and
process variables (student time doing seatwork [tseatl] and ability grouping
istrem).



39
Table 6: OLS and VCS Model Estimates for 2,804 Students and
80 Classrooms/Schools Using 23 Explanatory Variables,
Thailand, 1981-82
O'S                          . VcS
Variable          Estimate      St. Error         Estimate      St. Error
Student Level
GRAND MEAN          17.659           -              17.314           -
XROT                   .699         .017               .634          .019
XAGE                 -.079          .014             -.073           .014
XSEX                   .746         .251             i.103           .271
YFOCCI                 .197         .363               .101          .367
-.403          .389             -.458          .386
.089         .458               .085          .458
YMEDUC                 .306         .279               .293          .276
.088         .465               .142          .458
-.018          .567             -.309          .566
-YMOREED               .861         .476               .786          .467
1.086          .475             1.015          .468
1.617          .519             1.542          .512
YPARENC                .388         .118               .375          .116
YPERCEV             -1.083          .137            -1.131           .136
YFUTURE                .576         .142               .533          .141
YDESIRE               .493          .201               .439          .198
Group Level
spci8l               -.029          .033             -.025           .057
senrolt               .437          .187               .481          .331
sstream              -.417          .275             -.422          .473
sputear              -.095          .032             -.110          .058
squalmt               .698          .246               .784         .429
tsex                 -.038          .266               .014         .1t63
tnstuds               .012          .014               .020         .023
tmthsub              1.836          .344             2.398           .593
txtbk                 .948          .266               .978         .461
tworkbk             -0.500          .167             -.499           .291
tvismat               .353          .269              .363          .468
torderl              -.024          .008             -.027           .013
tseatl                .005          .004               .006         .006
Variance            37.949         6.160             -             -
Pupil-level variance  -                             35.868         -
Pupil-level sigma      -                             5.989         -
Group-level variance  -                              2.285         -
Group-level sigma      -                             1.512         0.174
Deviance               -                         18088.395         -



40
ThLd model. As before, we estimate this model with both the
smaller and larger data sets. The estimates from the OLS and VCS models using
the former reduced list of variables are given in Table 7; the same schools
and pupils are involved as for Table 6. For the latter, larger data set of
3,025 students in 86 schools, we fit the reduced modal (17 variables) and
present the results in Table 8. Again, the difference in deviances (3.5, X62)
is small. The effects of non-random missingness can be checked by comparing
the estimates in Tables 7 and 8. Applying our exclusion criterion to the
%'ariables in Model 3, we find that no further reduction of the list of
explanatory variables is possible.
Note that, because of the relatively small number of schools, the
appropriate conclusion about the 14 group-level variables we deleted is that
"we found insufficient evidence" of a systematic effect of these variables,
rather than "our analysis disproves their effects."   Further, a different
modelling scheme could lead to a different "minimal" set of important
explanatory variables. Because of collinearity, there may be a set of
alternative regression formulae that give a model fit that is not
substantially inferior to the one given in Table 8 in terms of the deviances.
A summary of the results of these analyses is provided in Table 9.
In all the models, student background characteristics are important
determinants of mathematics learning over time. School-level resources also
appear to have an important impact on achievement, with students in the larger
schools learning more than students in the smaller schools and students in
schools with a higher percentage of teachers qualified to teach mathematics
learning more than students in schools with a lower percentage of qualified
teachers; however, students in the schools with a higher student/teacher ratio
also learned more.



41
Table 7: OLS and VCS Model Estimates for 2,804 Students and
80 Classrooms/Schools Using 17 Explanatory Variables,
Thailand, 1981-82
OLS                             VCS
Variable          Estimate       St. Error         Estimate       St. Error
Student Level
GRAND MEAN           17.321           -              17.694            -
XROT                   .704          .017               .635          .018
XAGE                  -.077          .014             -.073           .014
XSEX                   .676          .247              1.086          .270
YFOCCI                 .181          .357               .085          .365
-.419          .387             -.465           .385
.105          .455              .082           .457
YMEDUC                 .293          .280               .238          .276
.112          .465              .154           .458
.014          .563             -.297           .564
YMOREED                .869          .476               .786          .467
1.128          .476             1.027           .468
1.666          .520             1.560           .512
YPARENC                .393          .117               .377          .116
YPERCEV             -1.076           .137            -1.130           .136
YFUTURE                .592          .142               .537          .141
YDESIRE                .477          .201               .431          .197
Group Level
senrolt                .285          .164               .367          .289
soutear              -.074           .030             -.094           .054
squalmt                .808          .239               .880          .427
tmthsub               1.950          .329             2.562           .576
txtbook                .948          .259               .946          .458
tworkbk              -.433           .160             -.402           .284
torderl              -.022           .006             -.024           .010
Variance             38.065         6.170             -              -
Pupil-level variance  -                              35.871          -
Pupil-level sigma       -                              5.989         -
Group-level variance  -                                2.429         -
Group-level sigma       -                              1.558         0.176
Deviance                -                         18091.983          -



42
Table 8: OLS and VCS Model Estimates for 3,025 Students and
86 Classrooms/Schools Using 1/ Explanatory Variables,
Thailand, 1981-82
OLS                             VCS
Variable          Estimate      St. Error         Estimate      St. Error
Student Level
GRAND MEAN          17.238           -              17.536           -
XROT                   .695         .017               .629          .018
XAGE                 -.075           .014            -.071           .014
XSEX                   .658         .238             1.053           .260
YFOCCI                 .152         .343               .074          .351
-.415          .373             -.435          .373
.115         .443               .123          .446
YMEDUC                 .371         .269               .343          .265
.056          .449              .073          .442
.066          .554            -.259           .555
YMOREED                .854         .461               .755          .453
1.195          .459             1.064          .452
1.703          .500             1.532          .494
YPARENC                .361         .113               .347          .112
YPERCEV             -1.140           .132           -1.191           .132
YFUTURE                .614         .137               .543          .136
YDESIRE                .484         .194               .459          .190
Grout Level
senrolt               .271          .160               .350         .279
sRutear              -.076          .029             -.094           .052
sgualmt               .847          .232               .903          .410
tmthsub              1.968          .327             2.546           .566
txtbk                1.047          .250             1.071           .437
tworkbk              -.434          .157             -.417           .275
torderl              -.023          .006             -.025           .010
Variance            38.271         6.186             -             -
Pupil-level variance  -                             36.138         -
Pupil-level sigma      -                             6.012         -
Group-level variance  -                              2.353         -
Group-level sigma      -                             1.534           .169
Deviance               -                         19537.962         -



43
Classroom variables also affect achievement. Students in non-
remedial classes learned more than students in remedial classes; students in
classes where the teacher used textbooks more often learned more than students
in classes in which textbooks were not used. On the other hand, workbooks and
teacher time spent maintaining order were negatively related to learning.
Table 9: Summary of Tables
Tables
4       5       6        7       8
OLS variance                  38.03   38.11   37.95   38.07   38.27
St. error                  6.17    6.17    6.'.6   6.17    6.19
VCS pupil-level variance      36.81   36.96   35.87   35.87   36.14
Sigma                      6.07    6.08    5.99    5.99    6.01
VCS group-level variance
For G. mean                1.32    1.35    2.29    2.43    2.35
Sigma                      1.15    1.16    1.51    1.56    1.53
St. error for sigma        0.19    0.19    0.17    0.17    0.17
Sample size
Pupils                    2,076   2,076   2,804   2,804   3,025
Groups                       60      60       80      80       86
Several researchers have considered the contextual effects in
educational studies involving multi-level data (see Raudenbush and Bryk 1986).
In our case, contextual analysis would involve using within-school means of
pupil-level variables as school-level variables. However, as was pointed out
earlier, we have abundant school-level information (14 school-level variables



44
for 99 schools), and contextual analysis would only aggravate further the high
level of confounding of the school-level variables. Contextual variables are
more relevant in studies where the aim is to produce, or at least consider, a
ranking of schools. The ranking may depend crucially on the explanatory
variables used and can often be affected by even the inclusion of variables
with statistically insignificant regression coefficients. This point
highlights the need :o select models based on educational theory rather than
on purely statistical criteria that contain a great deal of arbitrariness.
Modelling of Group-Level Variation (Random Slopes and Random Differences)
Simultaneously with reducing the fixed (regression) part of the
variance component model for our data, we also need to explore extensions of
the random part to obtain a better description of the group-level variation
than the one offered by the group-level variance. We concentrate first on a
reduction of the fixed part to a shorter list of explanatory variables
because: (i) the school-level variation is rather small and (ii) in the models
with complex descriptions of variation, the estimates of fixed effects and
their standard errors differ very little from those obtained so far (Table 8).
In the variance component models fitted so far (Tables 4-8), the
within-group regressions are assumed to be constant across groups, with the
exception of the intercept (position), which has a fitted variance of 2.35.
More generally, the regression coefficients with respect to any of the
pupil-level variables may be allowed to vary across the groups. These
variables, selected from the variables included in the fixed part, form the
random part of the model. The group-level variables are not considered for



45
the random part, because within-group regreFsions with respect to such
variables cannot be identified.
Variance component models closely resemble the models for the
analysis of covariance. The simple variance component models correspond to
ANOCOVA models, with no interactions of covariates with the grouping factor.
The (complex) variance component models with variable within-group regressions
(slopes and/or differences) correspond to ANO RVA models with group x
covariate interactions. The difference between the variance component and
ANOCOVA models is in their emphasis on the description of variation as opposed
to differences among the groups and in the assumption of the rormality of the
group effects in the former. The model specification in both models is
analogous:
a, list of covariates (fixed part),
b, sublist of covariates that have interactions with the grouping
factor (random part).
We now turn to modelling the random part. For a continuous variable
included in the random part, the within-group regression slopes with respect
to this variable are assumed to vary randomly (and to be distributed normally)
with an unknown variance. For a categorical variable included in the random
part, the within-group (adjusted) differences among the categories are
normally distributed. We can consider the "stereotypical" group, for which
the regression is given by the fixed part model (the average regression),
with the regressions for the groups varying around this average regression.
The deviations of the regression coefficients form a random sample (i.i.d.)
from a multivariate normal distribution. The components of the vector of
deviations (for a group) cannot be assumed to be independent; thus, their



46
covariance structure has to be considered. However, the variances of these
deviations (or random effects) are the main interest.
Data with only a moderate number of groups and wil'. limited numbers
of subjects within groups (classroom sizes), as is the case in this analysis,
contain only limited information about variation, comparable to the limited
information about interactions in models of analysis of covariance. Usually,
information about the covariance structure is even scarcer. Therefore, if
many variances are included in the random part (and estimated as free
parameters), we can expect high correlations among the estimates - large
estimated variances with large standard errors. Moreover, the number of
covariances to be estimated grows rapidly with the number of variarnces, and
many of the estimated correlations corresponding to these covariances are then
close to +1 or -1. The variance matrix with these variances and covariances
is not of full rank, and the random effects are linearly dependent. Therefore
it is important to adhere to the principle of parsimony and seek the simplest
adequate description for group-level variation. In selecting the covariances
to be estimated, we use the guidelines set by Goldstein (1987) and Longford
(1987).
Although selection of a model for the random part involves only
pupil-level variables (inclusion/exclusion), it is more complex than the
selection for the fixed part because constraints can also be imposed on the
covariances. The most general variance component model would involve 17
variances (the number of regression parameters in Table 8) and 17*16/2 - 136
covariances. Fitting such a model is clearly not a realistic proposition.
Thus, model selection has to proceed by building up the random part from
simpler to more complex models. The models fitted are all invariant with



47
respect to the choice of the location of the explanatory variables. In the
computations, all the variables are centered around the overall mean, and the
estimated variance matrix refers to this "centered" parametrization. However,
the variance matrix for a different parametrization is easy to calculate by a
quadratic transformation.
In selecting the model for the random part, we proceed according to
the following stages. For all the models we use the same fixed part as in
Table 8. The estimates and standard errors for the regression parameters
differ very slightly from those in Table 8 for all these models. This fact
justifies post hoc our approach of first settling the fixed part and then
modelling the random parts. First we fit models with one pupil-level variable
in the random part. Using the likelihood ratio test to compare the fitted
model to the model with the simple randon. part (Table 8), we select the
following variables: pretest score (XROT); age (XAGE); motivation (YDESIRE);
and educational expectation (YMOREED).
The first three variables are ordinal and are associated with one
variance each. The likelihood ratio (the difference of the deviances) for
each of the three corresponding models is larger than 3. This criterion is
intentionally very conservative, since we prefer to err on the side of
inclusion. Two parameters are involved - a variance (slope-variance) and a
covariance (slope-by-intercept covariance) - but they are not free
parameters, since they have to satisfy the condition of positive definiteness.
The distribution of the difference of the deviances is x2 2 only if the
correlation corresponding to the covariance is not equal to +1 or -1. The
problem of negative variances is resolved by estimating the square roots of
the variances (sigmas). In the actual computational algorithm, negative



48
sigmas do not arise, and the estimated variance matrix is always non-negative
definite.
Next we fit the VC model with these four variables in the random
part and simplify the random part by excluding variables and setting certain
covariances to 0. The variance associated with the variable XAGE is very
small (.00095), and its square root has a low t-ratio (.75), so that it can be
constrained to 0 (excluded). The implication is a constraint on all the
covariances involving XAGE, which are also set to 0. The three remaining
variables and the intercept are represented by a 6x6 variance matrix: 6
variances and 15 covariances, almost as many parameters as are in the fixed
part. The fitted variance matrix is:
Intercept           2.581
XROT                  .0143     .00558
YMOREED   Cat.2       .191      .0388       .812
Cat.3      .519       .0439      .0621    1.032
Cat.4      .384       .0354     -.0241     .261    1.032
YDESIRE               .0863    -.0127     -.307      -.303    -.346    .677
The decrement in deviance as compared with the VCS model (Table 8) is only
13, a result that hardly warrants the addition of these 21 parameters in the
model.
The software used provides standard errors for the square roots of
the variances (sigmas and diagonal elements of the matrix) and for the
covariances. The sigmas and their standard errors are:



49
Intercept    XROT             YMOREED             YDESIRE
cat. 2   cat. 3   cat. 4
Sigma           1.607      .0747    .901    1.175    1.016         .828
St. error        .176      .0261    .429       .451      .640      .295
The standard errors for the covariances involving XROT and
categories of YMOREED (rows 3-5 in column 2) are between .059 - .063 and for
those involving YDESIRE and YMOREED (columns 3-5 in row 6) are .56 - .62.
Since each of these covariances has a small t-ratio, they are constrained to 0
in the next model. The following estimated variance matrix is obtained (the
sigmas and their standard errors are given to the right of the variance
matrix):
Variable                          Matrix                          Sigma  St. Error
Intercept       2.237                                             1.496   .173
XROT              .0141   .00343                                    .0586  .0317
YMOREED Cat. 2  .199       0       .0230                            .152   .639
Cat. 3  .601      0       .0791  1.490                    1.221   .439
Cat. 4  .443      0       .003    .392    .826             .989   .753
YDESIRE         .119   -.0178       0        0       0   .746    .864   .276



50
Exclusion of these six covariances leads to an increase in the deviance of
only 1.8. The variance associated with the second category of YMOREED falls
substantially, and it can also be constrained to 0, together with the three
covariances in the same row and column of the variance matrix. Constraining
these four parameters causes an increase in the deviance of only .2. The
reestimated variance matrix is:
Variables                        Matrix                        Sigma   St. Error
Intercept       2.415                                        1.554       .162
XROT              .0455   .00390                               .0625      .0313
YMOREED Cat. 2    0          0    0                              0         0
Cat. 3  1.136        0    0    1.788                 1.337       .341
Cat. 4   .740        0    0    1.157  1,424          1.193       .514
YDESIRE           .304   -.0436   0       0       0    .830   .911        .260
The rank of this matrix is 4 (the two variance matrices given above
are also singular). Thus it appears that another variance parameter can be
constrained to 0. However, the t-ratio for each of the sigmas is high, and
only a complex linear reparametrization of the variables included in the
random part would enable further simplicaticn of the model.
The variance matrix obtained provides a description of group-level
variation in terms of 11 parameters, 5 variances and 6 covariances. However,
the difference between the variances in this model and the corresponding VCS
model is only 11 (for 10 parameters). That result provides further evidence



51
of overparametrization or collinearity in the random part. However, any
attempt to define a suitable model with fewer parameters would necessarily
involve some unnaturally defined variables, which would make interpretation of
the model very difficult.  Ve interpret these estimates as discussed below.
The variation in the slope on XROT provides evidence of an unequal
"conversion" of ability at the beginning of the year into ability at the end
of the year. Such a concl.usion is appropriate only subject to the caveats
discussed in the summary chapter. The slope on XROT is shallower in some
schools, where the initial differences in XROT tend to be associated with
smaller differences in YROT than in schools where the slopes are steeper.
The regression slope for YDESIRE is about .5, which is the
regression slope for the "stereotypical" school, where every feature is
"average."   The variation associated with this regression slope has a
standard deviation of .9; that is, there is a large (predicted) proportion of
schools where the slope on YDESIRE is very small or even negative. The
correlation of the within-group slopes on XROT and YDESIRE is -.77: lower
"effects" of motivation to succeed are associated with schools where the
initial differences become exaggerated by the end of the year.
The variances associated with categories 3 and 4 of YMOREED
(expectations to complete five or more years of schooling) represent the
variation of the adjusted differences between categories 3 and 1 (expectation
to complete fewer than two more years of education) and 4 and 1, respectively.
While the fitted difference between categories 2 (two to four more years) and
1 is about .8 and constant for all the schools, the average within-school
difference between categories 3 and 1 is 1.1, with a variance of 1.8.
Therefore this difference is negative in several schools. The situation with



52
the categories 4 and 1 contrast is similar, although the number of schools
with the reversed sign of the difference is much smaller. The correlation of
the random effects associated with categories 3 and 4 is .725; a high 3-1
contrast is associated with a high 4-1 contrast; but the fitted variance for
the contrast 4-3 is 1.79 + 1.42 - 2*1.16 - .89, whereas the average difference
is 1.58 - 1.08 - .50. Hence there are schools where the pupils with
YMOREED - 3 have lower adjusted scores on YROT than where YMOREED - 4,
although on average the fourth category is .5 points ahead.
The estimates of the regression parameters differ only marginally
for the different specifications of the random part. This result justifies,
post hoc, our approach of modelling first the regression part of the model and
then the random part. The regression estimates for the last model considered
are given in Table 10.
Conditional MWectations of the Random Effects
In the fixed-effects ANOVA or ANOCOVA, estimates of the effects
associated with the groups are obtained. In variance component models, these
effects are represented by random variables. Conditional upon the adopted
model, the expectations of the (random) group-effects can be considered as the
group-level residuals, or as "estimates" of the group-effects. These
conditional expectations have to be inspected as to whether they conform with
the assumptions of normality. This inspection involves a check for skewness
and kurtosis (not carried out here, but visual inspection indicates no
problems) and a check for outlying values of the effects. The latter check is
obviously also of substantive importance because it would be useful to detect
schools with exceptionally high or low performance, where the categories of



53
Table 10: Fixed-effect Estimates for the Final Model with Random
Effects for 3,025 Students and 86 Classrooms/Schools Using
18 Explanatory Variables, Thailand, 1981-82
VCS
Variable                            Estimate      St. Error
Student Level
GRAND MEAN                           16.642
XROT                                   .617          .020
XAGE                                  -.070          .014
XSEX                                  1.143          .260
YFOCCI                                 .101          .352
-.488          .374
.198         .446
YMEDUC                                 .347          .268
.062         .446
-.491          .560
YMOREED                                .816          .453
1.117         .476
1.618          .514
YPARENC                                .358          .112
YPERCEV                              -1.178          .133
YFUTURE                                .526          .137
YDESIRE                                .480          .217
Group Level
senrolt                                .300          .265
sDutear                               -.063          .048
saualmt                                .781          .380
tmthsub                               2.632          .582
txtbook                               0.949          .431
tworkbk                               -.372          .270
torderl                               -.035          .012
tseatl                                 .007          .006
Variance                              -               -
Pupil-level variance                 35.259
Pupil-level sigma                     5.938
Group-level variance                 See matrix in the text
Group-level sigma
Deviance                         19,064.902
Number of iterations                  8



54
YMOREED have substantially different differences than do average schools, in
which the outcomes are more/less influenced by the initial score XROT.
The complex nature of the variation, involving three variables,
coupled with the number of groups, makes it infeasible to discuss the
deviations of the group-level regressions from the average regression. In
fact, the main motivation for using variance component analysis has been to
obtain a global description of variation, without reference to individual
groups. The added advantage is that owing to the shrinkage property of the
conditional expections, extreme results attributable to unreliability for some
of the schools with small numbers of students are avoided. The conditional
expectations are a mixture of the pooled ordinary least squares solution and
the within-group regression; the weight depends on the amount of information
contained in the data from the group. Conditional expectations are obtained
even for schools where the number of pupils in the data is smaller than the
number of regression parameters. Because of this shrinkage, we cannot
pinpoint all the schools where, say, the difference between categories 3 and 1
has a negative sign. For several schools, the conditional means indicate a
small difference among the categories; some of these may be negative, others
positive and larger than the conditional expectation. Accordingly, we should
downscale our notion of what is an exceptionally large deviation; for example,
a 1.5 multiple of the standard deviation (sigma) should be regarded as
exceptional.
We conclude with an example of an exceptional school. All the
random-effects components of school 22 (42 pupils in the data) are positive.
Its deviation from the average regression formula is



55
1.517 + .100 XROT + .102 YDESIRE + 1.008 YM3 + .842 YM4,
where YM3 (and YM4) are equal to 1 if the pupil is in category 3 (4) and 0
otherwise. This outcome indicates that school 22 is characterized by high
performance, with the differences in initial ability tending to get
exaggerated. That is, pupils with high motivation and high expectations are
at an advantage. For sample mean values of XROT and YDESIRE, this formula
becomes
2.959 + 1.008 YM3 + .842 YM4,
which reflects the high "performance" of the school much more clearly. The
variances quoted above refer to the regression using centered versions of all
the variables (XROT -       , YDESIRE - YDESIRE , YM3 - Y , YM4 - sf).  In
the transformation from one parametrization to the other, only the
intercept-variance is affected.
CHAPTER V: DISCUSSION
At the outset of this paper, we posed a series of questions:
(i) do schools affect student learning differentially? (ii) what part of this
variation is attributable to between school characteristics versus between
student characteristics? (iii) what characteristics of teachers and schools
enhance student achievement, independent of student background? (iv) are



56
these effects uniform across students? (v) what is the comparative
effectiveness of alternative inputs? and (vi) how do estimates obtained from
simple OLS methods compare with estimates obtained from multi-level methods?
During the analysis, a sixth question arose: are there alternative regression
models that predict student achievement equally well as the model developed
herein? In this section, we review our findings and present some caveats
about their interpretation.
Summary
School effects. The first analysis in this paper examined the
extent to which schools differed in their ability to transform pretest scores
into posttest scores. We found that the schools in this sample from Thailand
were equally effective in converting pretest into posttest scores and that
there were essentially no variable slopes in this respect. That is, the
results from the simple variance component model did not differ significantly
from those obtained from the variance component model that included variable
slopes.
Contribution of school versus individual characteristics. In our
second analysis, we examined group and individual effects on total variance.
Group-level effects contributed 32% of the variance, while individual-level
effects contributed 68% of the variance in posttest scores, after controlling
for the pretest scores. We were able to explain most of the group-level
variation but were less successful in explaining individual variation.



57
Effective teacher and school characteristics. The results from
our final analysis indicate that some teacher and school characteristics are
positively associated with student learning in Thailand:
o     The percentage of teachers in the school that are qualified to
teach mathematics
o    an enriched mathematics curriculum and
o    the frequent use of textbooks by teachers.
At the same time, some teaching practices are negatively related to learning:
o    the frequent use of workbooks, and
o    time spent maintaining order in the classroom.
The positive results are not surprising. Teachers who know the
subject matter being taught, a cu.riculum that covers the domain, and
textbooks that provide a structured presentation of the material all should
have positive effects on achievement. The negative results are also
unsurprising. Teachers who spend a great deal of time maintaining classroom
order will have less time available for teaching; therefore, less learning
takes place. Similarly, frequent use of workbooks may detract from effective
teaching, answering questions and so forth.



58
Uniformity of effects. In this sample, we found that the schools
did not have uniform effects on all students. In particular, the effects
differed according to the level of students' expectations about further
education. Some schools/classrooms were more effective for students with low
expectations, some were more effective for students with high expectations,
while others were equally effective (or ineffective) for all types of
students. Interestingly enough, we found little evidence that schools were
differentially effective for students on the basis of gender, age, parental
occupation or several other student attitudes.
Comparative effectiveness of inputs. Overall, we found few school
"inputs" that were associated with differential achievement over time.
Frequent use of textbooks increased achievement by a full point on the
posttest, while use of workbooks decreased achievement by a third of a point;
an enriched curriculum increased posttest scores by over 2.5 points. Each
additional percentage point of teachers qualified to teach mathematics raised
posttest scores by over 1 point.
However, these causal statements do not hold if they are to be
interpreted as the result of an external intervention. Obtaining (additional)
textbooks for the schools is not a simple procedure unrelated to educational
processes and management decisions; it is itself an outcome variable related
to some (unknown) aspects of the educational process. Similarly, discarding
workbooks might not lead to improved outcomes, unless all the circumstances
that lead to reduced use of workbooks are also present or are induced
externally. External intervention will be free of risk only if we have, and
apply, causal models for how the educational system functions. The models
developed in this paper, and elsewhere in the literature on educational



59
research, are purely descriptive. Use of regression methods and of variance
component analysis allows improved description but does not provide inferences
about causal relationships.
In addition, interpretations of the estimates of effects are subject
to a variety of influences, and there may be alternative regression models,
with different variables, that are equally correct in terms of prediction.
Thus, the selection of variables included in this model is responsible, to
some degree, for the results, and a different selection of variables could
yield substantially different rerults with respect to the contribution of each
variable.
Comparison with OLS. The analysis demonstrates that estimates based
on OLS regressions do yield different results, in some cases, from those based
on VC regressions. For example, in comparing the OLS estimates with the VCS
estimates in Figure 6, we see that for tmthsub the coefficients are quite
different. Based on OLS, we would conclude that students in "enriched"
class. z, with the other explanatory variables controlled for, perform about 2
points (13%) higher than those in "normal" or "remedial" classes; the
conclusion based on the VC regression is that they perform nearly 2.6 points
(17%) higher. Combining these effects with cost information permits an
estimation of cost- effectiveness. If enriched classes cost 13% more than
remedial or normal classes, we would conclude that they were either equally
cost-effective (OLS) or more cost-effective (VC) than are remedial/normal
classes, depending on the model. Similarly, if enriched classes cost 17% more
than remedial/normal classes, they would be either equally cost-effective (VC)
or less cost-effective (OLS), depending on the model.



60
However, the caution in the previous subsection about causal
inference applies equally in this context. Classes, or schools, cannot be
declared to have enriched curricula at an external will and by supplying the
outward signs of having enriched curriculum; rather, a whole complex of
related circumstances has to be arranged, e.g., strengthened education in
lower grades, synchronization with other subjects, etc. Since we argued
earlier in the paper that estimates based on VC methods are preferable to
those based on OLS methods, differences of these types could hold important
policy implications for schools deciding on the type of curriculum to choose.
caveats
We have noted that alternative models can yield similar predictions
(in terms of achievement) but might include a different set of variables.
That such could be the case is not a problem limited to VC models; it is a
perennial problem with these general types of analyses. In our analysis, we
included a number of individual pupil and school/classroom variables; in this
respect, we moved well beyond earlier models, which included only modest
"intake" characteristics of students. Identifying the variables associated
with higher outcome scores does not, however, offer a direct answer to the
principal question of a development agency about the distribution of its
resources to a set, or continuum, of intervention policies in an educational
sy3tem. Without any prior knowledge of the educational system, any
justification for an intervention policy based on the results of regression
(or variance component) analysis, or even of structural modelling (LISREL),
has no proper foundation. Certain intervention policies may cause a change in
the educational system, and hence a change in the regression model itself.



61
This new regression model may indicate that the selected ir. ervention is far
from optimal or may even be detrimental.
A case in point is the pretest score XROT. Its coefficient is
positive and of substantial magnitude. A conceivable intervention policy to
raise the XROT scores would be, for example, to provide coaching prior to
administering the pretest. Clearly such an intervention, if effective, could
lead to a change in the regression formula. Alternatively, if coaching took
place between %he pretest and posttest, the regression formula would again be
changed, but differently. Any number of different scenarios is easy to
construct, in which the coefficient on XROT would be close to 1 or
substantially lower than .62 (the level obtained in our analysis).
Similarly, indiscriminant reduction of the time spent maintaining
order in the classroom, probably a less expensive intervention in monetary
terms, is likely to be an unreasonable solution. Introduction of the enriched
mathematics curriculum for all students is most likely not practical, and even
its extension to a few more classrooms may place excessive requirements on
staff in the schools that would lower the quality of instruction in other
subjects and/or other grades.
In conclusion, positive or negative regression coefficients cannot
be regarded uncritically as indicators of cause and effect, or influence. An
intervention should be regarded as an experiment, whose outcome can be
predicted from an observational study only under the unrealistic assumptions
of the regression formula describing accurately the mechanics of a rigid
educational process.



62
This finding does not mean that absolutely no inferences can be made
without a carefully designed experiment. It means that the results of the
statistical analysis based violated assumptions of randomization should be
supplemented with external information about the complex selection processes
and other sources of bias. This adjustment does not submit to a rigorous
treatment, and therefore we can only speculate how different our results would
have been had we carried out a (hypothetical) experiment instead of a survey.
Three important items of information would assist in answering the
question about the allocation of resources:
Mi)   What are the feasibility and cost of various interventions
(ii)   How an intervention will affect other explanatory variables
and which aspects of the educational process will remain
unaltered after the intervention
(iii)   How directly manipulable the "interventions" are.
It is critical to distinguish between the variables that are
manifest (unchangeable, e.g., pupil background), that are manipulable (e.g.,
time spent on a task of a particular kind) and that are manipulable only by
direct intervention. For example, the time spent maintaining discipline is a
manipulable variable, but it can be manipulated either indirectly (e.g., by
making the curriculum more interesting or by providing more suitable or more
interesting textbooks) or directly (by changing teacher behavior so as to



63
ignore disruptive student behavior). Considerations an to effective education
policy require attention to directly manipulable variables. In the present
analysis, these are the qualifications of the mathematics teachers and the use
of textbooks.



64
REFERENCES
Aitkin, M., & N. Longford. (1986). Statistical modelling issues in school
effectiveness studies (with discussion). Journal of the Royal
Statistical Society, Series B, 149:1-43.
Avalos, B., & Haddad, W. (1981). A review of teacher effectiveness research.
Ottawa: International Development Research Centre.
Bryk, A.S., Raudenbush, S.W., Seltzer, M., & Congdon, Jr., R.T. (1986). An
Introduction to HLM: Computer Program and User's Guide, Chicago,
University of Chicago (processed)
Coleman, J., Hoffer, T., & Kilgore, S. (1982). High School Achievement:
Public. Catholic and Private Schools Compared. New York: Basic.
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum likelihood for
incomplete data via the EM algorithm (with discussion). Journal of the
Royal Statistical Society. Series B, 39, 1-38.
Dempster, A.P., Rubin, D.B. & Tsutakawa, R.K. (1981). Estimation in
covariance component models. Journal of the American Statistical
Association, 76, 341-353.
Fuller, B. (1987). Raising school quality in developing countries: What
investments boost learning? Review of Educational Research, 57, 255-291.
Goldstein, H. (1984). The methodology of school comparisons. Oxford Review of
Education, 10, 69-74.
___. (1986).  Multilevel mixed linear model analysis using iterative
generalized least squares. Biometrika, 73, 43-56.
___. (1987).  Multilevel Models in Educational and Social Research, New
York: Oxford University Press.
Harbison R. & E. Hanushek. (1988). Educational performance of the poor:
Lessons from rural northeast Brazil. Draft manuscript.
Heyneman, S.P. (1986). The search for school effects in developing countries:
1966-1986. EDI Seminar Paper No. 33. Washington, D.C.: World Bank.
Heyneman, S.P. & Jamison, D.T. (1980). Student learning in Uganda: Textbook
Availability and other factors, ComRarative Education Review, 23,
206-220
Heyneman, S.P. & Loxley, W. (1983) The effect of primary school quality on
academic achievement across twenty-nine high and low-income countries,
American Journal of Sociology, 88, 1162-1194.



65
Husen, T., Saha, L., & Noonan, R. (1978). Teacher training and student
achievement in less developed countries (Staff Working Paper 310).
Washington DC: The World Bank.
Lindley, D.V. & Smith, A.F.M. (1972). Bayes estimates for the linear model
(with discussion). Journal of the Royal Statistical Society. Series B,
43, 1-41.
Lockheed, M.E., Fuller, B. & Nyirongo, R. (1987). Family effects on student
achievement in Thailand and Malawi, World Bank: Population and Human
Resources Department (processed).
Lockheed, M.E., Vail, S. & Fuller, B. (1987) How textbooks affect achievement
in developing countries: Evidence from Thailand. Educational Evaluation
and Policy Analysis, 8, 379-392
Lockheed, M.E. & Hanushek, E. (1988) Improving educational efficiency in
developing countries: What do we know? Compare, 18, 21-38.
Lockheed, M.E., Foncier J. & Bianchi, L. (1989). Effective primary level
science teaching in the Philippines. "Paper presented at the annual
meeting of the American Sociological Association in San Francisco,
California, Aug. 9-13, 1989
Lockheed, M.E. & Komenan, A. (1989). Teaching quality and student achievement
in Africa: the case of Nigeria and Swaziland. Teaching + teacher
education. Great Britain- Pergamon Press.
Longford, N.T. (1986). VARCL-Interactive software for variance component
analysis. The Professional Statistician, 5, 28-32.
Longford, N.T. (1987). A fast scoring algorithm for maximum likelihood
estimation in unbalanced mixed models with nested random effects.
Biometrika, 74, 817-827.
Mason, W. M., Wong, G. Y. & Entwistle, B. (1984). The multilevel linear
model: A better way to do contextual analysis. Sociological Methodology.
London: Jossey-Bass.
Psacharopoulos, G. & Loxley, W. (1986). Diversified Secondary Education and
Development, Baltimore, MD: Johns Hopkins University Press.
Raudeneush, S.W. (1987). Educational applications of hierarchical linear
models: A review. Michigan State University (processed).
Raudenbush, S.W. & Bryk, A.S. (1986). A hierarchical model for studying
school effects. Sociology of Education, 59, 1-17
Reynolds, D. (1985).  Introduction: Ten years on-a decade of research and
activity in school effectiveness research reviewed. In D. Reynolds (Ed.)
Studying School Effectiveness. London: The Falmer Press.



66
Rosenholtz, S. (1989). Teachers' workplace: The social organization of
schoolg. White Plains, NY: Longman. Sirotnik, K., & Burstein, L.
(1985).
Rutter, M. (1983). School effects on pupil progress: Research findings and
policy implication. Child DeveloRment, 54, 1-29.
Schiefelbein, E., & Simmons, J. (1981). Determinants of school achievement:
A review of research for develoRing countries (mimeo). Ottawa:
International Development Research Centre.
Sirotnik, K.A. & Burstein, L. (1985). Measurement and statistical issues in
multilevel research on schooling. Educational Administration Ouarterly,
21, 169-185.
Willms, J.D. (1987). Differences between Scottish education authorities in
their examination attainment. Oxford Review of Education, 13, 211-231.



PPR Working Paper Series
Contact
IWa                             Author                   Date           for paper
WPS223 Overvalued and Undervalued        Jose Saul Lizondo       August 1989      R. Luz
Exchange Rates in an Equilibrium                                         61588
Optimizing Model
WPS224 The Economics of the Government  Stanley Fischer          May 1989         S. Fischer
Budget Constraint                                                        33774
WPS225 Targeting Assistance to the Poor:   Paul Glewwe           June 1989        B. Rosa
Using Household Survey Data     Oussama Kanaan                           33751
WPS226  Inflation and the Costs of       Andres Solimano         July 1989        E. Khine
Stabilization: Country Experiences,                                      61763
Conceptual Issues, and Policy
Lessons
WPS227  Institutional Reforms in Sector  Samuel Paul             July 1989        E. Madrona
Adjustment Operations                                                    61712
WPS228  Recent Economic Performance of   Robert Lynn             July 1989        M. Divino
Developing Countries            F. Desmond McCarthy                      33739
WPS229 The Effect of Demographic Changes Steven B. Webb          July 1989        E. Khine
on Saving for Life-Cycle Motives    Heidi S. Zia                         61765
in Developing Countries
WPS230 Unemployment, Migration, and      Bent Hansen             July 1989        J. Timmins
Wages in Turkey, 1962-85                                                 39248
WPS231  The World Bank Revised Minimum    Doug Addison           May 1989         J. Onwuemene-
Standard Model: Concepts and                                             Kocha
Issues                                                                   61750
WPS232 Women and Food Security in Kenya Nadine R. Horenstein     June 1989        M. Villar
33752
WPS233  Public Enterprise Reform in      John Nellis             August 1989      R. Malcolm
Adjustment Lending                                                       61708
WPS234 A Consistency Framework           William Easterly        June 1989        R. Luz
Macroeconomic Analysis                                                   61760
WPS235  Borrowing, Resource Transfers,   Steven B. Webb          July 1989        E. Khine
and External Shocks to Developing  Heidi S. Zia                          61765
Countries: Historical and
Counterfactual
WPS236  Education and Earnings in Peru's    Peter Moock          July 1989        M. Fisher
Informal Nonfarm Family Enterprises Philip Musgrove                      34819
Morton Stelcner



PPR Working Paper Series
Contact
illa                             Author                   Dat             for paDer
WPS237 The Curricular Content of Primary   Aaron Benavot          June 1989        C. Cristobal
Education in Developing Countries  David Kamens                          33640
WPS238 The Distributional Consequences of Ehtisham Ahmad          August 1989      A. Bhalla
a Tax Reform On a VAT for Pakistan Stephen Ludlow                         60359
WPS239  The Choice Between Unilateral and  Julio Nogues           July 1989        S. Torrijos
Multilateral Trade Liberalization                                        33709
Strategies
WPS240 The Public Role in Private         Ake Blomqvist           August 1989      A. Bhalla
Post-Secondary Education:       Emmanuel Jimenez                          61059
A Review of Issues and Options
WPS241  The Effect of Job Training on     Ana-Maria Arriagada     July 1989        C. Cristobal
Peruvian Women's Employment and                                          33640
Wages
WPS242  A Multi-Level Model of School     Marlaine E. Lockheed    July 1989        C. Cristobal
Effectiveness in a Developing   Nicholas T. Longford                     33640
Country
WPS243  Averting Financial Crisis - Kuwait   Fawzi H. Al-Sultan   July 1989        R. Simaan
72167
WPS244  Do Caribbean Exporters Pay Higher Alexander J. Yeats      July 1989        J. Epps
Freight Costs?                                                           33710
WPS245  Developing a Partnership of       Peter Poole             August 1989      S. Davis
Indigenous Peoples, Conservationists,                                    38622
and Land Use Planners in Latin America
WPS246  Causes of Adult Deaths in         Richard Hayes           July 1989        S. Ainsworth
Developing Countries: A Review of Thierry Mertens                        31091
Data and Methods                Geraldine Lockett
Laura Rodrigues
WPS247  Macroeconomic Policies for        Carlos AHredo Rodriguez  August 1989     R Luz
Structural Adjustment                                                     61588
WPS248  Private Investment, Government    Mansoor Dailami         August 1989      M. Raggambi
Policy, and Foreign Capital in  Michael Walton                            61696
Zimbabwe
WPS249  The Determinants of Hospital Costs: Ricardo Bitran-Dicowsky   August 1989  V. Israel
An Analysis of Ethiopia         David W. Dunlop                           48121
WPS250  The Baker Plan: Progress,         William R. Cline        August 1989      S. King-Watson
Shortcomings, and Future                                                  33730