National Assessments of Educational Achievement   41789

VOLUME 1


Assessing National
Achievement Levels
in Education


Vincent Greaney
Thomas Kellaghan
Assessing National
Achievement
Levels in
Education
National Assessments of Educational Achievement




VOLUME 1



Assessing National
Achievement
Levels in
Education
Vincent Greaney
Thomas Kellaghan
© 2008 The International Bank for Reconstruction and Development / The World
Bank
1818 H Street NW
Washington, DC 20433
Telephone: 202-473-1000
Internet: www.worldbank.org
E-mail: feedback@worldbank.org

All rights reserved
1 2 3 4 10 09 08 07

This volume is a product of the staff of the International Bank for Reconstruction
and Development / The World Bank. The findings, interpretations, and conclu-
sions expressed in this volume do not necessarily reflect the views of the Executive
Directors of The World Bank or the governments they represent.
   The World Bank does not guarantee the accuracy of the data included in this
work. The boundaries, colors, denominations, and other information shown on any
map in this work do not imply any judgement on the part of The World Bank
concerning the legal status of any territory or the endorsement or acceptance of
such boundaries.

Rights and Permissions
The material in this publication is copyrighted. Copying and/or transmitting por-
tions or all of this work without permission may be a violation of applicable law.
The International Bank for Reconstruction and Development / The World Bank
encourages dissemination of its work and will normally grant permission to repro-
duce portions of the work promptly.
   For permission to photocopy or reprint any part of this work, please send a re-
quest with complete information to the Copyright Clearance Center Inc., 222
Rosewood Drive, Danvers, MA 01923, USA; telephone: 978-750-8400; fax: 978-
750-4470; Internet: www.copyright.com.
   All other queries on rights and licenses, including subsidiary rights, should be
addressed to the Office of the Publisher, The World Bank, 1818 H Street NW,
Washington, DC 20433, USA; fax: 202-522-2422; e-mail: pubrights@world-
bank.org.

Cover design: Naylor Design, Washington, DC
ISBN-13: 978-0-8213-7258-6
eISBN: 978-0-8213-7259-3
DOI: 1596/978-0-8213-7258-6

Library of Congress Cataloging-in-Publication Data
Assessing national achievement levels in education / [edited by] Vincent Greaney
and Thomas Kellaghan.
  p. cm.
 Includes bibliographical references.
 ISBN 978-0-8213-7258-6 (alk. paper) — ISBN 978-0-8213-7259-3
 1. Educational tests and measurements. 2. Educational evaluation. I. Greaney,
Vincent. II. Kellaghan, Thomas.
 LB3051.A7663 2007
 371.26…2—dc22
 2007022161
                  CONTENTS



PREFACE                                                          ix

ACKNOWLEDGMENTS                                                  xi

ABBREVIATIONS                                                   xiii


 1. INTRODUCTION                                                  1

 2. NATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT                   7
    What Are the Main Elements in a National Assessment?        12
    How Does a National Assessment Differ from
      Public Examinations?                                      14

 3. WHY CARRY OUT A NATIONAL ASSESSMENT?                        17

 4. DECISIONS IN A NATIONAL ASSESSMENT                          23
    Who Should Give Policy Guidance for the
      National Assessment?                                      23
    Who Should Carry Out the National Assessment?               25
    Who Will Administer the Tests and Questionnaires?           29
    What Population Will Be Assessed?                           30
    Will a Whole Population or a Sample Be Assessed?            32
    What Will Be Assessed?                                      34
    How Will Achievement Be Assessed?                           39
    How Frequently Will Assessments Be Carried Out?             43
    How Should Student Achievement Be Reported?                 44
    What Kinds of Statistical Analyses Should Be Carried Out?   46


                                                                  v
vi    |   CONTENTS




          How Should the Results of a National Assessment
            Be Communicated and Used?                               48
          What Are the Cost Components of a National Assessment?    49
          Summary of Decisions                                      52

     5. ISSUES IN THE DESIGN, IMPLEMENTATION, ANALYSIS,
           REPORTING, AND USE OF A NATIONAL ASSESSMENT             53
          Design                                                    53
          Implementation                                            55
          Analysis                                                  57
          Report Writing                                            59
          Dissemination and Use of Findings                         60

     6. INTERNATIONAL ASSESSMENTS OF STUDENT
          ACHIEVEMENT                                              61
          Growth in International Assessment Activity               63
          Advantages of International Assessments                   66
          Problems with International Assessments                   70

     7. CONCLUSION                                                 77

APPENDIXES

 A. COUNTRY CASE STUDIES                                           85
          India                                                     85
          Vietnam                                                   87
          Uruguay                                                   90
          South Africa                                              92
          Sri Lanka                                                 95
          Nepal                                                     97
          Chile                                                     99
          United States                                            101
          Uganda                                                   103

 B. INTERNATIONAL STUDIES                                          109
          Trends in International Mathematics and Science Study    109
          Progress in International Reading Literacy Study         114
          Programme for International Student Assessment           119

 C. REGIONAL STUDIES                                               127
          Southern and Eastern Africa Consortium for Monitoring
            Educational Quality                                    127
          Programme d’Analyse des Systèmes Éducatifs de la
            CONFEMEN                                               135
                                                          CONTENTS   |    vii




        Laboratorio Latinoamericano de Evaluación de la
          Calidad de la Educación                                        139


REFERENCES                                                               145

INDEX                                                                    155


BOXES
2.1      Ethiopia: National Assessment Objectives                         10
2.2      Examples of Questions Addressed by Vietnam’s
         National Assessment                                              11
2.3      Main Elements of a National Assessment                           12
4.1      Proposed NSC Membership in Sierra Leone                          24
4.2      Examples of Multiple-Choice Items                                41
4.3      Examples of Open-Ended Items                                     41
6.1      South Africa’s Experience with International Assessments         75


FIGURES
3.1      The Achievement Gap in the United States for Nine-Year-Old
         Students: NAEP Reading Assessment, 1971–99                    19
3.2      Percentages of Fourth Grade Students at or above “Proficient”
         in Reading, NAEP 1992–2003                                    20
4.1      Mean Percentage Correct Scores for Students’ Mathematics
         Performance, by Content Area, Lesotho                         45
A.9.1    Grade 6 Literacy Test Score Distribution in Uganda           106
B.3.1    Sample of PISA Mathematics Items                             121
B.3.2    PISA Mean Reading Literacy Scores and Reading
         Subscale Scores                                              123
B.3.3    Student Proficiency Levels in PISA Mathematics                124
B.3.4    Percentage of Students at Each Proficiency Level on PISA
         Mathematics Scale                                            125
B.3.5    Percentage of Students at Each Proficiency Level on
         PISA Reading Scale                                           126
C.1.1    Percentage of Grade 6 Students Reaching Proficiency Levels in
         SACMEQ Reading, 1995–98                                      133
C.1.2    Changes in Literacy Scores between SACMEQ I and
         SACMEQ II                                                    134
C.2.1    Percentage of Grade 5 Pupils with Low Achievement, PASEC,
         1996–2001                                                    138
C.3.1    Socioeconomic Gradients for 11 Latin American Countries,
         LLECE                                                        143
viii   |   CONTENTS




TABLES
2.1        Differences between National Assessments and Public
           Examinations                                                14
4.1        Options for Implementing a National Assessment              26
4.2        Advantages and Disadvantages of Census-Based Assessment to
           Hold Schools Accountable                                    34
4.3        PIRLS Reading Comprehension Processes                       36
4.4        Percentage Achieving Goal or Mastery Level by Grade,
           Connecticut, 2006                                           47
4.5        Bodies with Primary Responsibility for Decisions in a
           National Assessment                                         52
6.1        Comparison of TIMSS and PISA                                64
6.2        Percentage of Students Reaching TIMSS International
           Benchmarks in Mathematics, Grade 8: High- and
           Low-Scoring Countries                                       73
A.2.1      Percentages and Standard Errors of Pupils at Different
           Skill Levels in Reading                                     89
A.2.2      Relationship between Selected Teacher Variables and
           Mathematics Achievement                                     89
A.5.1      Background Data and Source in Sri Lankan
           National Assessment                                         96
A.5.2      Percentage of Students Achieving Mastery in
           the First Language, by Province                             97
A.7.1      Index for Merit Awards for Schools in Chile, 1998–99       101
A.9.1      Percentages of Uganda Grade 3 Pupils Rated Proficient in
           English Literacy, 2005                                     105
B.1.1      Target Percentages of the TIMSS 2007 Mathematics
           Tests Devoted to Content and Cognitive Domains,
           Fourth and Eighth Grades                                   111
B.1.2      TIMSS Distribution of Mathematics Achievement, Grade 8 113
B.2.1      Percentages of Students Reaching PIRLS Benchmarks in
           Reading Achievement, Grade 4                               118
C.3.1      Percentage of Students Who Reached Each Performance
           Level in Language, by Type of School and Location,
           LLECE 1997                                                 141
C.3.2      Percentage of Students Who Reached Each Performance
           Level in Mathematics, by Type of School and Location,
           LLECE 1997                                                 142
                    PREFACE



In a speech to mark the first 100 days of his presidency of the World
Bank Group, Robert Zoellick outlined six strategic themes to guide
the Bank’s work in promoting an inclusive and sustainable globaliza-
tion. One of those themes focused on the role of the Bank as “a unique
and special institution of knowledge and learning. . . . a brain trust of
applied experience.” Zoellick noted that this role requires the Bank
“to focus continually and rigorously on results and on the assessment
of effectiveness.”
   This challenge is greatest in education, where the large body of
empirical evidence linking education to economic growth indicates
that improved enrollment and completion rates are necessary, but not
sufficient, conditions for poverty reduction. Instead, enhanced learning
outcomes—in the form of increased student knowledge and cognitive
skills—are key to alleviating poverty and improving economic com-
petitiveness (and will be crucial for sustaining the gains achieved in
education access to date). In other words, the full potency of education
in relation to economic growth can only be realized if the education
on offer is of high quality and student knowledge and cognitive skills
are developed.
   The available evidence indicates that the quality of learning out-
comes in developing countries is very poor. At the same time, few of
these countries systematically monitor such outcomes either through



                                                                            ix
                                                              PREFACE   |   x




conducting their own assessments of student achievement or through
participating in regional or international assessments. The lack of this
type of regular, system-level information on student learning makes it
difficult to gauge overall levels of achievement, to assess the relative
performance of particular subgroups, and to monitor changes in per-
formance over time. It also makes it difficult to determine the effec-
tiveness of government policies designed to improve outcomes in these
and other areas.
   This is a core issue for the Bank and its client countries as the focus
shifts from access to achievement. It also is an area in which there is a
dearth of tools and resources suited to the needs of developing countries.
This series of books, edited by Vincent Greaney and Thomas Kellaghan,
contributes in a significant way to closing this gap. The series is designed
to address many of the issues involved in making learning outcomes a
more central part of the educational agenda in lower-income countries.
It will help countries to develop capacity to measure national levels of
student learning in more valid, sustainable, and systematic ways. Such
capacity will hopefully translate into evidence-based policymaking that
leads to observable improvement in the quality of student learning. It
is an important building block toward achieving the real promise of
education for dynamic economies.

Marguerite Clarke
Senior Education Specialist
The World Bank
                   ACKNOWLEDGMENTS



A team led by Vincent Greaney (consultant, Human Development
Network, Education Group, World Bank) and Thomas Kellaghan
(Educational Research Centre, St. Patrick’s College, Dublin) prepared
this series of books.
   Other contributors to the series were Sylvia Acana (Uganda
National Examinations Board), Prue Anderson (Australian Council
for Educational Research), Fernando Cartwright (Canadian Council
on Learning), Jean Dumais (Statistics Canada), Chris Freeman (Aus-
tralian Council for Educational Research), Hew Gough (Statistics
Canada), Sara Howie (University of Pretoria), George Morgan
(Australian Council for Educational Research), T. Scott Murray
(DataAngel Policy Research) and Gerry Shiel (Educational Research
Centre, St. Patrick’s College, Dublin).
   The work was carried out under the general direction of Ruth
Kagia, World Bank Education Sector Director, and Robin Horn,
Education Sector Manager. Robert Prouty initiated and supervised
the project up to August 2007. Marguerite Clarke supervised the
project in its later stages through review and publication. We are
grateful for contributions of the review panel: Al Beaton (Boston
College), Irwin Kirsch (Educational Testing Service), and Benoit
Millot (World Bank).




                                                                        xi
                                              ACKNOWLEDGMENTS   |   xii




   Additional peer-review comments were provided by a number of
World Bank staff, including Carlos Rojas, Eduardo Velez, Elizabeth
King, Harry Patrinos, Helen Abadzi, Jee-Peng Tan, Marguerite
Clarke, Maureen Lewis, Raisa Venalainen, Regina Bendokat, Robert
Prouty, and Robin Horn.
   Special thanks are due to Aidan Mulkeen and to Sarah Plouffe.
We received valuable support from Cynthia Guttman, Matseko
Ramokoena, Aleksandra Sawicka, Pam Spagnoli, Beata Thorstensen,
Myriam Waiser, Peter Winograd, and Hans Wagemaker. We are also
grateful to Patricia Arregui, Harsha Aturupane, Luis Benveniste,
Jean-Marc Bernard, Carly Cheevers, Zewdu Gebrekidan, Venita
Kaul, Pedro Ravela, and Kin Bing Wu.
   We wish to thank the following institutions for permission to
reproduce material: Examinations Council of Lesotho, International
Association for the Evaluation of Educational Achievement, National
Center for Education Statistics of the U.S. Department of Education,
the Organisation for Economic Co-operation and Development, and
the Papua New Guinea Department of Education.
   Hilary Walshe helped prepare the manuscript. Book design, editing,
and production were coordinated by Mary Fisk and Paola Scalabrin of
the World Bank’s Office of the Publisher.
   The Irish Educational Trust Fund; the Bank Netherlands Partnership
Program; the Educational Research Centre, Dublin; and the Australian
Council for Educational Research have generously supported prepara-
tion and publication of this series.
            ABBREVIATIONS



CONFEMEN   Conférence des Ministres de l’Education des Pays
           ayant le Français en Partage
DiNIECE    Dirección Nacional de Información y Evaluación de
           la Calidad Educativa (Argentina)
EFA        Education for All
IEA        International Association for the Evaluation of
           Educational Achievement
IIEP       International Institute for Educational Planning
LLECE      Laboratorio Latinoamericano de Evaluación de la
           Calidad de la Educación
MOE        ministry of education
MESyFOD    Modernización de la Educación Secundaria y
           Formación Docente (Uruguay)
NAEP       National Assessment of Educational Progress
           (United States)
NAPE       National Assessment of Progress in Education
           (Uganda)
NSC        national steering committee
OECD       Organisation for Economic Co-operation and
           Development
PASEC      Programme d’Analyse des Systèmes Éducatifs de la
           CONFEMEN
PIRLS      Progress in International Reading Literacy Study


                                                         xiii
                                        ABBREVIATIONS   |   xiv




PISA     Programme for International Student Assessment
SACMEQ   Southern and Eastern Africa Consortium for
         Monitoring Educational Quality
SIMCE    Sistema de Medición de la Calidad de la Educación
         (Chile)
SNED     National System of Teacher Performance Assess-
         ment in Publicly Supported Schools (Chile)
SSA      Sarva Shiksha Abhiyan (India)
TA       technical assistance
TIMSS    Trends in International Mathematics and Science
         Study
UMRE     Unidad de Medición de Resultados Educativos
         (Uruguay)
UNEB     Uganda National Examinations Board
UNESCO   United Nations Educational, Scientific, and Cul-
         tural Organization
CHAPTER

        1
                         INTRODUCTION


                         In this introductory book, we describe the main
     features of national and international assessments, both of which
     became extremely popular tools for determining the quality of educa-
     tion in the 1990s and 2000s. This increase in popularity reflects two
     important developments. First, it reflects increasing globalization and
     interest in global mandates, including Education for All (UNESCO
     2000). Second, it represents an overall shift in emphasis in assessing the
     quality of education from a concern with inputs (such as student par-
     ticipation rates, physical facilities, curriculum materials, and teacher
     training) to a concern with outcomes (such as the knowledge and skills
     that students have acquired as a result of their exposure to schooling)
     (Kellaghan and Greaney 2001b). This emphasis on outcomes can, in
     turn, be considered an expression of concern with the development of
     human capital in the belief (a) that knowledge is replacing raw materi-
     als and labor as resources in economic development and (b) that the
     availability of human knowledge and skills is critical in determining a
     country’s rate of economic development and its competitiveness in an
     international market (Kellaghan and Greaney 2001a). A response to
     this concern has required information on the performance of education
     systems, which, in turn, has involved a shift from the traditional use of



                                                                             1
2   | ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION



achievement tests to assess individual students toward their use to
obtain information about the achievements of the system of educa-
tion as a whole (or a clearly defined part of the system).
   The development of national assessment capacity has enabled
ministries of education—as part of their management function—to
describe national levels of learning achievement, especially in key subject
areas, and to compare achievement levels of key subgroups (such as
boys and girls, ethnic groups, urban and rural students, and public and
private school students). It has also provided evidence that enables
ministries to support or refute claims that standards of student achieve-
ment are rising or falling over time.
   Despite growth in national and international assessment activity, a
lack of appreciation still exists in many quarters about the potential
value of the data that assessments can provide, as well as a deficit in
the skills required to carry out a technically sound assessment. Even
when countries conduct a national assessment or participate in an
international one, the information yielded by the assessment is
frequently not fully exploited. A number of reasons may account for
this: the policy makers may have been only peripherally involved in
the assessment and may not have been fully committed to it; the
results of analyses may not have been communicated in a form that
was intelligible to policy makers; or the policy makers may not have
fully appreciated the implications of findings for social policy in
general or for educational policy in particular relating to curricular
provision, the allocation of resources, the practice of teaching, and
teachers’ professional development.
   This series of books is designed to address such issues by introduc-
ing readers to the complex technology that has grown up around the
administration of national and international assessments. This intro-
ductory book describes key national assessment concepts and proce-
dures. It is intended primarily for policy makers and decision makers
in education. The purposes and main features of national assessments
are described in chapter 2 (see also appendix A). The reasons for
carrying out a national assessment are considered in chapter 3, and
the main decisions that have to be made in the design and planning of
an assessment are covered in chapter 4. Issues (as well as common
errors) to be borne in mind in the design, implementation, analysis,
                                                      INTRODUCTION   |   3




reporting, and use of a national assessment are identified in chapter 5.
In chapter 6, international assessments of student achievement, which
share many procedural features with national assessments (such as
sampling, administration, background data collected, and methods of
analysis—see appendix B), are described.
    The main point of difference between national and international
assessments highlights both a strength and a weakness of an inter-
national assessment. The strength is that an international assessment
provides data from a number of countries, thereby allowing each
country to compare the results of its students with the results achieved
by students in other countries. The weakness is that the requirement
that test instruments be acceptable in all participating countries
means that they may not accurately reflect the range of achievements
of students in individual countries.
    A further feature of international assessments is that many partici-
pating countries carry out internal analyses that are based on data
collected within a country. Thus, the data collected for the inter-
national study can be used for what is, in effect, a national assess-
ment. However, the practice is not without its problems, and the data
that are collected in this way may be less appropriate for policy than
if they had been collected for a dedicated national assessment.
    An intermediate procedure that lies between national assessments
in individual countries and large-scale international studies that span
the globe is the regional study in which a number of countries in a
region that may share many socioeconomic and cultural features
collaborate in a study (see appendix C).
    A further variation is a subnational assessment in which an assess-
ment is confined to a region (a province or state) within a country.
Subnational assessments have been carried out in a number of large
countries (such as Argentina, Brazil, and the United States) to meet
local or regional information needs. Those exercises are relatively
independent and differ from national assessments in that participants
in all regions within a country do not respond to the same instru-
ments and procedures; thus, direct comparisons of student achieve-
ment between regions are not possible.
    In the final chapter of this volume, some overall conclusions are
presented, together with consideration of conditions relating to the
4   | ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION



development and institutionalization of national assessment capacity
and to the optimal use of assessment findings. At the end of the book,
the main features of national assessments in nine countries are described
(appendix A), followed by descriptions of three international studies
(appendix B) and three regional studies (appendix C).
   Subsequent books in this series provide details of the design and
implementation of a national assessment. The books are designed to
provide those directly involved in the tasks of constructing tests and
questionnaires and of collecting, analyzing, or describing data in a
national assessment with an introduction to—and basic skills in—key
technical aspects of the tasks involved.
   The second book, Developing Tests and Questionnaires for a National
Assessment of Educational Achievement, has sections on developing (a)
achievement tests, (b) questionnaires, and (c) administration manuals.
The first section addresses the design of achievement tests and the role
that a test framework and blueprint or table of specifications plays in
the design. It describes the process of item writing and gives examples
of various item types, including multiple-choice, short-answer, and
open-ended response items. It also describes the item review or panel-
ing process, an essential exercise to ensure test-content validity. It
includes guidelines for conducting pretests, selecting items for the final
test, and producing the final version of a test. The section concludes
with a brief treatment of training scorers or raters and hand-scoring test
items. The second section describes steps in the construction of ques-
tionnaires: designing a questionnaire, writing items, scoring and coding
responses, and linking data derived from the questionnaire with stu-
dents’ achievement scores. The final section describes the design and
content of an administration manual and the selection and role of a test
administrator. The book has an accompanying CD, which contains test
and questionnaire items released from national and international
assessments and a test administration manual.
   Implementing a National Assessment of Educational Achievement,
the third book in the series, is also divided into three sections. The
first section focuses on practical issues to be addressed in implement-
ing a large-scale national assessment program. It covers planning,
budgeting, staffing, arranging facilities and equipment, contacting
schools, selecting test administrators, packing and shipping, and
                                                     INTRODUCTION   |   5



ensuring test security. This section also covers the logistical aspects
of test scoring, data cleaning, and report writing. The second section
includes a step-by-step guide designed to enable assessment teams to
draw an appropriate national sample. It includes a CD with sampling
software and a training dataset to be used in conjunction with the
guide. Topics addressed are defining the population to be assessed,
creating a sampling frame, calculating an appropriate sample size,
sampling with probability proportional to size, and conducting mul-
tistage sampling. Data cleaning and data management are treated in
the final section. This section is also supported by a CD with step-
by-step exercises to help users prepare national assessment data for
analysis. Procedures for data verification and data validation, includ-
ing “wild codes” and within-file and between-file consistency checks,
are described.
   Analyzing Data from a National Assessment of Educational Achieve-
ment, the fourth book, is supported by two CDs, which require users
to apply statistical procedures to datasets and to check their mastery
levels against solutions depicted on screenshots in the text. The first
half of the book deals with the generation of item-level data using
both classical test and item response theory approaches. Topics
addressed include analyzing pilot and final test items, monitoring
change in performance over time, building a test from previously cre-
ated items, equating, and developing performance or proficiency lev-
els. The second half of the book is designed to help analysts carry out
basic-level analysis of national assessment results and includes
sections on measures of central tendency and dispersion, mean score
differences, identification of high and low achievers, correlation,
regression, and visual representation of data.
   Reporting and Using Results from a National Assessment of Educa-
tional Achievement, the final book in the series, focuses on writing
reports in a way that will influence policy. It introduces a methodol-
ogy for designing a dissemination and communication strategy for a
national assessment program. It also describes the preparation of a
technical report, press releases, briefings for key policy makers, and
reports for teachers and other specialist groups. The second section of
the book highlights ways that countries have actually used the results
of national assessments for policy making, curriculum reform, resource
6   | ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION



allocation, teacher training, accountability, and monitoring of changes
in achievement and other variables over time.
   Those who study the content of these books and who carry out the
specified exercises should acquire the basic skills required for a national
assessment. They should, however, bear in mind three factors. First,
they should not regard the books as providing simple formulas or algo-
rithms to be applied mechanically but should be prepared to exercise
judgment at varying points in the national assessment (for example, in
selection of test content, in sampling, and in analysis). Judgment in
these matters should improve with experience. Second, users may, on
occasion, require the advice of more experienced practitioners in mak-
ing their judgments. Third, users should be prepared to adapt to the
changes in knowledge and technology that will inevitably occur in the
coming years.
CHAPTER

        2               NATIONAL
                        ASSESSMENTS OF
                        STUDENT
                        ACHIEVEMENT


                         We begin the chapter by defining a national assess-
     ment and listing questions that a national assessment would be designed
     to answer. A listing of the main elements of a national assessment
     follows. Finally, we consider the differences between a national assess-
     ment and public examinations.
        A national assessment is designed to describe the achievement of
     students in a curriculum area aggregated to provide an estimate of the
     achievement level in the education system as a whole at a particular
     age or grade level. It provides data for a type of national education
     audit carried out to inform policy makers about key aspects of the
     system. Normally, it involves administration of achievement tests
     either to a sample or to a population of students, usually focusing on
     a particular sector in the system (such as fifth grade or 13-year-old
     students). Teachers and others (for example, parents, principals, and
     students) may be asked to provide background information, usually
     in questionnaires, which, when related to student achievement, can
     provide insights about how achievement is related to factors such as
     household characteristics, levels of teacher training, teachers’ atti-
     tudes toward curriculum areas, teacher knowledge, and availability of
     teaching and learning materials.



                                                                           7
8   | ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION



   National assessment systems in various parts of the world tend to
have common features. All include an assessment of students’ language
or literacy and of students’ mathematics abilities or numeracy. Some
systems assess students’ achievements in a second language, science,
art, music, or social studies. In practically all national assessment
systems, students at the primary-school level are assessed. In many
systems, national assessments are also carried out in secondary school,
usually during the period of compulsory education.
   Differences also exist in national assessment systems from country to
country. First, they differ in the frequency with which assessments are
carried out. In some countries, an assessment is carried out every year,
although the curriculum area that is assessed may vary from year to year.
In other systems, assessments are less frequent. Second, they differ in the
agency that carries out an assessment. In some systems, the ministry of
education carries out the assessment; in others, the assessment is by a
national research center, a consortium of educational bodies, a university,
or an examination board. Third, participation by a school may be vol-
untary or may be mandated. When voluntary, nonparticipation of some
schools will almost invariably bias the results and lead to an inaccurate
reflection of achievement levels in the education system.
   Although most industrial countries have had systems of national as-
sessment for some time, it was not until the 1990s that the capacity to
administer assessments became more widely available in other parts of
the world. For example, rapid development in the establishment of
national assessments took place during the 1990s in Latin American
and Caribbean counties, often to provide baseline data for educational
reforms (Rojas and Esquivel 1998). The development represented a
shift in the assessment of quality from emphasis on educational inputs
to outcomes following the Jomtien Declaration (see World Declaration
on Education for All 1990). Article 4 of the Jomtien Declaration states
that the focus of basic education should be “on actual learning acquisi-
tion and outcome, rather than exclusively upon enrolment, continued
participation in organized programs and completion of certification re-
quirements” (World Declaration on Education for All 1990, 5). More
recently, the Dakar Framework for Action (UNESCO 2000), which
was produced at the end of the 10-year follow-up to Jomtien, again
highlighted the importance of learning outcomes. Among its list of
                          NATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT    |   9




seven agreed goals was, by 2015, to improve “all aspects of the quality
of education . . . so that recognised and measurable outcomes are
achieved by all, especially in literacy, numeracy, and essential life skills”
(UNESCO 2000, iv, 7).
   These statements imply that, for countries pledged to achieving the
goals of Education for All (EFA), efforts to enhance the quality of
education will have to be accompanied by procedures that will provide
information on students’ learning. As a result, national governments
and donor agencies have greatly increased support for monitoring
student achievement through national assessments. The assumption is
frequently made not only that national assessments will provide
information on the state of education, but also that use of the
information should lead to improvement in student achievements.
Whether this improvement ultimately happens remains to be seen. So
far, the expectation that EFA and regular monitoring of achievement
levels would result in an improvement in learning standards does not
seem to have materialized (Postlethwaite 2004). This outcome may be
because—although EFA led to rapid increases in numbers attending
school—larger numbers were not matched by increased resources
(especially trained teachers). Furthermore, the information obtained
from assessments has often been of poor quality, and even when it has
not, it has not been systematically factored into decision making.
   All national assessments seek answers to one or more of the following
questions:

• How well are students learning in the education system (with ref-
  erence to general expectations, aims of the curriculum, preparation
  for further learning, or preparation for life)?
• Does evidence indicate particular strengths and weaknesses in
  students’ knowledge and skills?
• Do particular subgroups in the population perform poorly? Do
  disparities exist, for example, between the achievements of (a) boys
  and girls, (b) students in urban and rural locations, (c) students from
  different language or ethnic groups, or (d) students in different
  regions of the country?
• What factors are associated with student achievement? To what
  extent does achievement vary with characteristics of the learning
10    |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




  environment (for example, school resources, teacher preparation
  and competence, and type of school) or with students’ home and
  community circumstances?
• Are government standards being met in the provision of resources (for
  example, textbooks, teacher qualifications, and other quality inputs)?
• Do the achievements of students change over time? This question
  may be of particular interest if reforms of the education system are
  being undertaken. Answering the question requires carrying out as-
  sessments that yield comparable data at different points in time
  (Kellaghan and Greaney 2001b, 2004).

   Most of those questions were addressed in the design and imple-
mentation of Ethiopia’s national assessment (see box 2.1).
   A feature of Vietnam’s approach to national assessment, in addition
to assessing student achievement, was a strong focus on key inputs,
such as physical conditions in schools, access to educational materials,
and teacher qualifications (see box 2.2).



 BOX 2.1


     Ethiopia: National Assessment Objectives

     1. To determine the level of student academic achievement and attitude
        development in Ethiopian primary education.

     2. To analyze variations in student achievement by region, gender, location,
        and language of instruction.

     3. To explore factors that influence student achievement in primary education.
     4. To monitor the improvement of student learning achievement from the first
        baseline study in 1999/2000.

     5. To build the capacity of the education system in national assessment.

     6. To create reliable baseline data for the future.

     7. To generate recommendations for policy making to improve educational
        quality.
     Source: Ethiopia, National Organisation for Examinations 2005.
                            NATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT      | 11




BOX 2.2


 Example of Questions Addressed by Vietnam’s
 National Assessment
 Questions Related to Inputs

 • What are the characteristics of grade 5 pupils?

 • What are the teaching conditions in grade 5 classrooms and in primary
   schools?
 • What is the general condition of the school buildings?

 Questions Related to Standards of Educational Provision

 • Were ministry standards met regarding

    — Class size?

    — Classroom furniture?

    — Qualifications of staff members?

 Questions Related to Equity of School Inputs

 • Was there equity of resources among provinces and among schools within
   provinces in terms of

    — Material resource inputs?

    — Human resource inputs?

 Questions Related to Achievement

 • What percentage of pupils reached the different levels of skills in reading
   and mathematics?

 • What was the level of grade 5 teachers in reading and mathematics?

 Questions Related to Influences on Achievement

 • What were the major factors accounting for the variance in reading and
   mathematics achievement?

 • What were the major variables that differentiated between the most and
   least effective schools?
 Source: World Bank 2004.
12    |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




WHAT ARE THE MAIN ELEMENTS IN A NATIONAL
ASSESSMENT?

Although national assessments can vary in how they are implemented,
they tend to have a number of common elements (see box 2.3 and
Kellaghan and Greaney 2001b, 2004).

 BOX 2.3


     Main Elements of a National Assessment

     • The ministry of education (MOE) appoints either an implementing agency
       within the ministry or an independent external body (for example, a
       university department or a research organization), and it provides funding.

     • The MOE determines policy needs to be addressed in the assessment,
       sometimes in consultation with key education stakeholders (for example,
       teachers’ representatives, curriculum specialists, business people, and
       parents).

     • The MOE, or a steering committee nominated by it, identifies the popula-
       tion to be assessed (for example, fourth grade students).

     • The MOE determines the area of achievement to be assessed (for example,
       literacy or numeracy).

     • The implementing agency defines the area of achievement and describes it
       in terms of content and cognitive skills.
     • The implementing agency prepares achievement tests and supporting
       questionnaires and administration manuals, and it takes steps to ensure
       their validity.

     • The tests and supporting documents are pilot-tested by the implementing
       agency and subsequently are reviewed by the steering committee and
       other competent bodies to (a) determine curriculum appropriateness and
       (b) ensure that items reflect gender, ethnic, and cultural sensitivities.

     • The implementing agency selects the targeted sample (or population) of
       schools or students, arranges for printing of materials, and establishes
       communication with selected schools.

     • The implementing agency trains test administrators (for example, class-
       room teachers, school inspectors, or graduate university students).

     • The survey instruments (tests and questionnaires) are administered in schools
       on a specified date under the overall direction of the implementing agency.

     • The implementing agency takes responsibility for collecting survey
       instruments, for scoring, and for cleaning and preparing data for analysis.
                                                                         (continued)
                         NATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT        | 13




 BOX 2.3


  • The implementing agency establishes the reliability of the assessment
    instruments and procedures.

  • The implementing agency carries out the data analysis.

  • The draft reports are prepared by the implementing agency and reviewed
    by the steering committee.

  • The final reports are prepared by the implementing agency and are
    disseminated by the appropriate authority.

  • The MOE and other relevant stakeholders review the results in light of the
    policy needs that they are meant to address and determine an appropriate
    course of action.
  Source: Authors.




   It is clear from the list of elements in box 2.3 that a good deal of
thought and preparation are required before students respond to
assessment tasks. A body with responsibility for collecting data must be
appointed, decisions must be made about the policy issues to be
addressed, and tests and questionnaires must be designed and tried out.
In preparation for the actual testing, samples (or populations) of schools
and of students must be identified, schools must be contacted, and test
administrators must be selected and trained. In some countries (for
example, India, Vietnam, and some African countries), teachers have
been assessed on the tasks taken by their students (see A.1 and A.2 in
appendix A and C.1 in appendix C). Following test administration, a
lot of time and effort will be required to prepare data for analysis, to
carry out analyses, and to write reports.
   Low-income countries have to deal with problems over and above
those encountered by other countries in attempting to carry out a national
assessment. Education budgets may be meager. According to 2005
data (World Bank 2007), some countries devote 2 percent or less of
gross domestic product to public education (for example, Bangladesh,
Cameroon, Chad, the Dominican Republic, Guinea, Kazakhstan, the
Lao People’s Democratic Republic, Mauritania, Pakistan, Peru, the
Republic of Congo, United Arab Emirates, and Zambia) compared to
more than 5 percent in most middle- and high-income countries.
14   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Competing demands within the education sector for activities such as
school construction, teacher training, and provision of educational mate-
rials can result in nonavailability of funds for monitoring educational
achievement. Furthermore, many low- and, indeed, middle-income
countries have weak institutional capacity for carrying out a national
assessment. They may also have to face additional administrative and
communication problems caused by inadequate roads, mail service, and
telephone service. Finally, the very high between-school variation in stu-
dent achievement found in some low-income countries requires a large
sample (see UNEB 2006; World Bank 2004).



HOW DOES A NATIONAL ASSESSMENT DIFFER
FROM PUBLIC EXAMINATIONS?

Public examinations play a crucial role in many education systems
in certifying student achievement, in selecting students for further
study, and in standardizing what is taught and learned in schools.
Sometimes, public examinations are thought to provide the same
information as a national assessment, thus appearing to eliminate
the need for a national assessment system in a country that has a
public examination system. However, public examinations cannot
provide the kind of information that a national assessment seeks
to provide.
   First, since public examinations play a major role in selecting stu-
dents (for the next highest level in the education system and some-
times for jobs), they seek to discriminate between relatively high
achieving students and so may not provide adequate coverage of the
curriculum. Second, examinations, as well as the characteristics of
students who take them, change from year to year, thereby limiting
the inferences that can be made from comparisons over time. Third,
the fact that “high stakes” are attached to performance (that is, how
students do on an examination has important consequences for them
and perhaps for their teachers) means that teachers (and students)
may focus on those areas of the curriculum that are examined to the
neglect of important areas that are not examined (for example, prac-
tical skills), so that performance on the examination does not provide
                            NATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT          | 15




TABLE 2.1
Differences between National Assessments and Public Examinations
                          National assessments         Public examinations

Purpose                   To provide feedback to       To certify and select
                          policy makers.               students.
Frequency                 For individual subjects     Annually and more often
                          offered on a regular basis where the system allows for
                          (such as every four years). repeats.
Duration                  One or two days.             Can extend over a few
                                                       weeks.
Who is tested?            Usually a sample of          All students who wish to take
                          students at a particular     this examination at the
                          grade or age level.          examination grade level.
Format                    Usually multiple choice      Usually essay and multiple
                          and short answer.            choice.
Stakes: importance        Low importance.              Great importance.
for students, teachers,
and others
Coverage of               Generally confined to         Covers main subject areas.
curriculum                one or two subjects.
Effect on teaching        Very little direct effect.   Major effect: teacher
                                                       tendency to teach what is
                                                       expected on the examination.
Additional tuition        Very unlikely.               Frequently.
sought for students
Do students get           Seldom.                      Yes.
results?
Is additional         Frequently, in student           Seldom.
information collected questionnaires.
from students?
Scoring                   Usually involves statisti-   Usually a simple process that
                          cally sophisticated          is based on a predetermined
                          techniques.                  marking scheme.
Effect on level of        Unlikely to have an          Poor results or the prospect
student attainment        effect.                      of failure, which can lead to
                                                       early dropout.

Usefulness for            Appropriate if tests         Not appropriate because
monitoring trends in      are designed with            examination questions and
achievement levels        monitoring in mind.          candidate populations
over time                                              change from year to year.
Source: Authors.
16   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




an accurate reflection of the intended curriculum. Although there are
some exceptions, decisions about individual students, teachers, or
schools are not normally made following a national assessment.
   Fourth, information on student achievement is usually required at
an earlier age than that at which public examinations are held. Fifth,
the kind of contextual information (about teaching, resources, and stu-
dents and their homes) that is used in the interpretation of achievement
data collected in national assessments is not available to interpret public
examination results (Kellaghan 2006). Table 2.1 summarizes the major
differences between national assessments and public examinations.
CHAPTER

        3
                        WHY CARRY OUT
                        A NATIONAL
                        ASSESSMENT?


                         A decision to carry out a national assessment might
     be made for a variety of reasons. Frequently, national assessments
     reflect the efforts of a government to “modernize” its education
     system by introducing a business management (corporatist) approach
     (Kellaghan 2003). This approach draws on concepts used in the world
     of business, such as strategic planning and a focus on deliverables and
     results, and it may involve accountability based on performance.
     Viewed from this perspective, a national assessment is a tool for
     providing feedback on a limited number of outcome measures that are
     considered important by policy makers, politicians, and the broader
     educational community.
        A key objective of this approach is to provide information on the
     operation of the education system. Many governments lack basic
     information on aspects of the system—especially student achievement
     levels—and even on basic inputs to the system. National assessments
     can provide such information, which is a key prerequisite for sound
     policy making. For example, Vietnam’s national assessment helped
     establish that many classrooms lacked basic resources (World Bank
     2004). In a similar vein, Zanzibar’s assessment reported that
     45 percent of pupils lacked a place to sit (Nassor and Mohammed
     1998). Bhutan’s national assessment noted that some students had to

                                                                         17
18   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




spend several hours each day traveling to and from school (Bhutan,
Board of Examinations, Ministry of Education 2004). Namibia’s as-
sessment showed that many teachers had limited mastery of basic
skills in English and mathematics (Makuwa 2005).
   The need to obtain information on what students learn at school has
assumed increasing importance with the development of the
so-called knowledge economy. Some analysts argue that students will
need higher levels of knowledge and skills—particularly in the areas of
mathematics and science—than in the past if they are to participate
meaningfully in the world of work in the future. Furthermore,
because ready access to goods and services increases with globalization,
a country’s ability to compete successfully is considered to depend to a
considerable degree on the skills of workers and management in their
use of capital and technology. This factor might point to the need to
compare the performance of students in one’s education system with
the performance of students in other systems, although a danger exists
in assigning too much importance to aggregate student achievement in
accounting for economic growth, given the many other factors involved
(Kellaghan and Greaney 2001a).
   National assessments, when administered over a period of time, can
be used to determine whether standards improve, deteriorate, or
remain static. Many developing countries face the problem of expanding
enrollments, building many new schools, and training large numbers of
teachers while at the same time trying to improve the quality of educa-
tion—sometimes against a background of a decreased budget. In this
situation, governments need to monitor achievement levels to deter-
mine how changes in enrollment and budgetary conditions affect the
quality of learning. Otherwise, the risk exists that increased enrollment
rates may be readily accepted as evidence of an improvement in the
quality of education.
   National assessment data have been used to monitor achievement
over time. A series of studies in Africa between 1995/96 and 2000/01
revealed a significant decline in reading literacy scores in Malawi,
Namibia, and Zambia (see figure C.1.2 in appendix C). In the United
States, the National Assessment of Educational Progress (NAEP),
which has monitored levels of reading achievement over almost
three decades, found that although nine-year-old black and Hispanic
                                                    WHY CARRY OUT A NATIONAL ASSESSMENT?    | 19



children reduced the achievement gap with whites up to about
1980, the test score differential remained fairly constant thereafter
(figure 3.1). Also in the United States, the NAEP helped identify
the changing levels of reading achievement in various states (figure 3.2).
In Nepal, results of national assessments were used to monitor (a)
changes in achievement over the period 1997–2001 and, in particular,
(b) effects of policy decisions relating to budget, curricula, textbooks,
teaching materials, and teacher development (see A.6 in appendix A).
   When national assessment data are used to monitor achievement
over time, the same test should be used in each assessment or, if
different tests are used, some items should be common, so that perfor-
mance on the tests can be equated or linked. In either case, the com-
mon items should be kept secure so that student or teacher familiarity
with their content does not invalidate the comparisons being made.
   Other uses that can be made of a national assessment depend on
whether data were collected in a sample of schools or in a census in
which information is obtained about all (or most) schools. In both
cases, results can be used to provide direction to policy makers who
are interested in enhancing educational quality. For example, the
results can help governments identify the strength of the association
between the quality of student learning and various factors over which
they have some control (for example, availability of textbooks, class
size, and number of years of teacher preservice training).

 FIGURE 3.1

 The Achievement Gap in the United States for Nine-Year-Old Students: NAEP
 Reading Assessment, 1971–99

                 230
                 220
 reading score




                 210
                 200
                 190
                 180
                 170
                 160
                       1971   1975   1980    1984      1988 1990    1992   1994     1996   1999
                                                          year
                                            white           black        Hispanic

Source: Winograd and Thorstensen 2004.
20                        |    ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




 FIGURE 3.2

 Percentages of Fourth Grade Students at or above “Proficient” in Reading,
 NAEP 1992–2003
                              40
  percentage proficient




                              35

                              30


                              25

                              20

                              15
                                     1992         1994          1998          2002       2003
                                                                year
                                                    Delaware            New Mexico
                                                    Kentucky            North Carolina
                                                    Maryland            South Carolina
                                                    Texas

Source: Winograd and Thorstensen 2004.



   An analysis of findings can lead to decisions affecting the provision
of resources in the education system in general (for example, for the
reform of curricula and textbooks or for teacher development) or in
categories of schools with particular characteristics (for example,
schools in rural areas or schools serving students in socioeconomi-
cally disadvantaged areas). Many examples can be found of the use
of the findings of national and international assessments for such
purposes. They have been used in Australia to provide programs
designed to increase the participation and performance of girls in
mathematics and science (Keeves 1995); they have prompted
curriculum reform in low- and middle-income countries (Elley 2005),
have helped divert financial resources to poorer schools in Chile (see
A.7 in appendix A), and have promoted teacher professionalism in
Uruguay (see A.3 in appendix A).
   The results of a national assessment may also be used to change
practice in the classroom (Horn, Wolff, and Velez 1992). Getting
                                 WHY CARRY OUT A NATIONAL ASSESSMENT?   | 21



information to teachers and effecting changes in their behavior that
will substantially raise the achievements of students, however, is not
an easy task. The pressure on schools and classrooms to change is
greatest when the results of a national assessment are based on a cen-
sus, not a sample, and when high stakes are attached to performance.
No specific action may be taken by the authorities apart from the
publication of information about performance (for example, in league
tables), or sanctions may be attached to performance. Sanctions can
take the form of rewards for improved performance (for example,
schools, teachers, or both receive economic incentives if students
achieve a specific target) or “punishment” for poor performance (for
example, nonpromotion of students or dismissal of teachers) (see A.7
in appendix A for a brief description of Chile’s reward program).
   When a national assessment obtains information about the achieve-
ment of students in all (or most) schools, some policy makers may see
an opportunity to use these data to judge the quality of teachers and
schools. Obviously, teachers and students should bear some responsi-
bility for learning, but the role of institutions, agencies, and individuals
that exercise control over the resources and activities of schools
should also be reflected in an accountability system. Apportioning
fairly the responsibilities of all stakeholders is important, whether an
assessment is sample-based or census-based. The national assessment
in Uruguay provides a good example of recognition of the responsi-
bility of a variety of stakeholders (including the state) for student
achievement (see A.3 in appendix A).
   In some cases, a national assessment may simply have a symbolic
role, which is designed to legitimate state action by embracing inter-
nationally accepted models of modernity and by imbuing the policy-
making process with the guise of scientific rationality (Benveniste
2000, 2002; Kellaghan 2003). When this role motivates a national
assessment, the act of assessment has greater significance than its
outcomes. If a national assessment is carried out simply to meet the
requirement of a donor agency, or even to meet a government’s
international commitments to monitor progress toward achieving
the Millennium Development Goals, it may have little more than
symbolic value, and its findings may not be seriously considered in the
management of the education system or in policy making.
CHAPTER

        4
                         DECISIONS IN
                         A NATIONAL
                         ASSESSMENT


                        In this chapter, we consider 12 decisions that are
     involved in planning a national assessment (see Greaney and Kellaghan
     1996; Kellaghan 1997; and Kellaghan and Greaney 2001b, 2004).



     WHO SHOULD GIVE POLICY GUIDANCE FOR
     THE NATIONAL ASSESSMENT?

     The ministry of education should appoint a national steering commit-
     tee (NSC) to provide overall guidance to the agency that will carry out
     the assessment. The committee can help ensure that the national
     assessment has status and that key policy questions of interest to the
     ministry and others are addressed. It could also help resolve serious
     administrative and financial problems that might arise during the
     implementation of the national assessment. Giving the NSC a degree
     of ownership over the direction and intent of the national assessment
     also increases the likelihood that the results of the assessment will play
     a role in future policy making.
        The composition of an NSC will vary from country to country,
     depending on the power structure within the education system. In



                                                                            23
24    |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




addition to representatives of the ministry of education, NSCs might
include representatives of major ethnic, religious, and linguistic groups,
as well as those groups whose members will be expected to act on
the results (such as teacher trainers, teachers, school inspectors, and
curriculum personnel). Box 4.1 lists suggested members of a steering
committee for a national assessment in Sierra Leone proposed by
participants at an international workshop. Addressing the informa-
tion needs of those various stakeholders should help ensure that the
national assessment exercise does not result in a report that is
criticized or ignored because of its failure to address the “correct”
questions.
   The NSC should not be overburdened with meetings and should
not be required to address routine implementation tasks related to
the national assessment. In some cases, the NSC may provide direc-
tion at the initial stage by identifying the purpose of and rationale
for the assessment, by determining the curriculum areas and grade lev-
els to be assessed, or by selecting the agency or agencies to conduct the
assessment, although those items may also be decided before the
committee is established. The NSC is likely to be most active at the


 BOX 4.1


     Proposed NSC Membership in Sierra Leone
     • Basic Education Commission

     • Civil Society Movement

     • Decentralized Secretariat

     • Director-General of Education (chair)

     • Education Planning Directorate

     • Inter-Religious Council

     • National Curriculum Research Development Centre

     • Sierra Leone Teachers Union

     • Statistics Sierra Leone

     • Teacher Training Colleges

     • West African Examinations Council
                                  DECISIONS IN A NATIONAL ASSESSMENT   | 25



start of the assessment exercise, whereas the implementing agency
will be responsible for most of the detailed work, such as instrument
development, sampling, analysis, and reporting. The implementing
agency, however, should provide the NSC with draft copies of tests
and questionnaires and with descriptions of proposed procedures so
that committee members can provide guidance and can ensure that
the information needs that prompted the assessment in the first
place are being adequately addressed. NSC members should also
review draft reports prepared by the implementing agency.

Responsibility for providing policy guidance: Ministry of education



WHO SHOULD CARRY OUT THE NATIONAL ASSESSMENT?

A national assessment should be carried out by a credible team or orga-
nization whose work can command respect and enhance the likelihood
of broad-scale acceptance of the findings. Various countries have as-
signed responsibility for national assessments to groups ranging from
teams set up within the ministry of education, to autonomous bodies
(universities, research centers), to nonnational technical teams. We
would expect a variety of factors to influence such a decision, including
levels of national technical capacity, as well as administrative and
political circumstances. Table 4.1 lists some potential advantages and
disadvantages of different categories of implementation agencies that
merit consideration in deciding who should carry out an assessment.
   In some cases, traditions and legislation may impose restrictions on
the freedom of a ministry of education in choosing an implementing
agency. In Argentina, for example, provinces must authorize the cur-
ricular contents to be evaluated in the national assessment. Initially,
provinces were asked to produce test items; however, many provinces
lacked the technical capacity to do so. At a later stage, provinces were
presented with a set of sample questions for their endorsement and
the Dirección Nacional de Información y Evaluación de la Calidad
Educativa (DiNIECE) constructed the final assessment instruments
from the pool of preapproved test items. More recently, test items
have been designed independently by university personnel and
approved by the national Federal Council. The DiNIECE remains
                                                                                                                                                26
TABLE 4.1
Options for Implementing a National Assessment




                                                                                                                                               |
                                                                                                                                               ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION
Designated agency         Advantages                                                Disadvantages
Drawn from staff of       Likely to be trusted by ministry.                         Findings might be subject to political manipulation
ministry of education     Enjoys ready access to key personnel, materials, and      including suppression.
                          data (for example, school population data).               May be viewed skeptically by other stakeholders.
                          Funds that may not have to be secured for staff time.     Staff who may be required to undertake many other tasks.
                                                                                    Technical capacity who may be lacking.

Drawn from staff of       Usually is credible.                                      Staff who may be required to undertake many other
public examination unit   Has experience in running secure assessments.             tasks.
                          Funds that may not have to be secured for staff time.     Technical capacity that may be weak.
                          Some skills (for example, test development) that can be   May lack ready access to data.
                          transferred to enhance the examination unit.              Public examination experience that may result in test
                          More likely to be sustainable than some other models.     items that are too difficult.

Drawn from research/      Findings that may be more credible with stakeholders.     Have to raise funds to cover staff costs.
university sector         Greater likelihood of some technical competence.          May be less sustainable than some other models.
                          May use data for further studies of the education         May come into conflict with education ministry.
                          system.
Designated agency           Advantages                                                  Disadvantages

Recruited as foreign        More likely to be technically competent.                    Likely to be expensive.
technical assistance (TA)   Nature of funding that can help ensure timely completion.   May not be sensitive to educational context.
                                                                                        Difficult to ensure assessment sustainability.
                                                                                        Possibly little national capacity enhancement.




                                                                                                                                                   DECISIONS IN A NATIONAL ASSESSMENT
Made up of a national       Can improve technical capacity of nationals.                Possibly difficult to coordinate work of national team
team supported with         May ensure timely completion.                               members and TA.
some international TA       May add credibility to the results.                         Might be difficult to ensure skill transfer to nationals.
Ministry team supported     Can ensure ministry support while obtaining national TA.    National TA that may lack the necessary technical
with national TA            Less expensive than international TA.                       capacity.
                                                                                        Other potential disadvantages that are listed under
                                                                                        ministry of education and that may apply.
Source: Authors.




                                                                                                                                                   | 27
28   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




responsible for the design of achievement tests, the analyses of
results, and the general coordination of annual assessment activities.
   It is worth reflecting on the wide variety of skills that are required
to carry out a national assessment in deciding who should be given
responsibility for the task. This issue is addressed in more detail in
Implementing a National Assessment of Educational Achievement
(book 3 in this series). A national assessment is fundamentally a
team effort. The team should be flexible, willing to work under
pressure and in a collaborative manner, and prepared to learn new
assessment and technological approaches. The team leader should
have strong managerial skills. He or she will be required to organize
the staff, to coordinate and schedule activities, to support training,
and to arrange and monitor finance. The team leader should be
politically astute because he or she will need to report to an NSC
and to be a liaison with national, regional, and, in some instances,
district-level government bodies and representatives of stakeholders
(such as teachers and religious bodies).
   The team should have high-level implementation or operational
skills. Tasks to be completed include organizing workshops for item
writers and test administrators; arranging for printing and distribu-
tion of tests, questionnaires, and manuals; contacting schools;
developing training materials; and collecting and recording data. A
small dedicated team of test developers will be needed to analyze
the curriculum, develop tables of specifications or a test blueprint,
draft items, select items after pretesting or piloting, and advise on
scoring. Following test administration, open-ended and multiple-
choice questions have to be scored.
   The team will require support from one or more people with statis-
tical and analytical competence in selecting samples, in weighting data,
in data input and file preparation, in item analysis of test data as well as
general statistical analysis of the overall results, and in preparing data
files for others (for example, academics and postgraduate students) to
carry out secondary analyses. Many developing countries lack capacity
in this last area, leading to situations in which data are collected but
never adequately analyzed or reported.
   The team should have the necessary personnel to draft and dissemi-
nate results, press releases, and focused pamphlets or newsletters.
                                  DECISIONS IN A NATIONAL ASSESSMENT   | 29



It might also be reasonably expected to play a key role in organizing
workshops for teachers and other education officials so they can discuss
the importance of the results and the results’ implications for teaching
and learning.
   Most members of the team may work part time and be employed
as needed. This category could include item writers—especially prac-
ticing teachers with a good knowledge of the curriculum—and experts
in sampling and statistical analysis. Team members might be recruited
from outside the education sector. For example, a national census
bureau can be a good source of sampling expertise. Computer person-
nel with relevant experience could help with data cleaning, and
journalists could assist with drafting catchy press releases. Neither
Cambodia nor Ethiopia employed full-time staff members to carry
out its national assessment.

Responsibility for carrying out national assessment: Implementation
agency (ministry of education, examination board, research agency,
university).



WHO WILL ADMINISTER THE TESTS AND QUESTIONNAIRES?

National administrative traditions and perceptions of levels of trust, as
well as sources of finance, tend to influence the selection of personnel
responsible for administering tests and questionnaires in a national
assessment. Practice varies. For example, some countries have used
graduate students, while Zambia has involved school inspectors and
ministry officials in test and questionnaire administration. Other coun-
tries have used experienced teachers drawn from nonparticipating
schools or retired teachers. In the Maldives, a test administrator must
be a staff member of a school located on an island other than the island
where the targeted school is located.
   Test administrators should be carefully selected. They should have
good organizational skills, have experience of working in schools, and
be committed to following test and questionnaire guidelines precisely.
Ideally, they should have classroom experience, speak in the same
language and accent as the students, and have an authoritative but
nonthreatening manner. Book 3 of this series, Implementing a National
30   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Assessment of Educational Achievement, considers the advantages and
disadvantages of having teachers, inspectors, teacher trainers, examina-
tion board personnel, and university students as administrators.
   Although the use of teachers of students who are participating in the
national assessment as test administrators may appear administratively
convenient and very cost-effective, it is, for a variety of reasons, rarely
done. Some teachers might feel that their teaching effectiveness is
being evaluated. Some may find it difficult to desist from their normal
practice of trying to help students and might not be able to adjust to
the formal testing approach. Some may make copies of tests or test
items, thus ruling out the possibility of using those items in future
national assessments. Having teachers administer tests to their own
students might also diminish the public perception of the trustworthi-
ness of the assessment results.

Responsibility for administering tests and questionnaires: Implementation
agency



WHAT POPULATION WILL BE ASSESSED?

As the term is usually understood, national assessments refer to surveys
carried out in education systems. This connotation, however, was not
always the case. When the first national assessment was carried out in
the United States (in 1969), out-of-school populations (17- and 18-
year-olds and young adults 26–35 years of age), as well as school-going
populations, were assessed (in citizenship, reading, and science). The
assessment of the out-of-school populations was discontinued, how-
ever, because of cost (Jones 2003). Subsequent surveys of adult literacy
were carried out independent of national assessments.
   The issue of assessing younger out-of-school children is more
relevant in many developing countries than in the United States
because many children of school-going age do not attend school.
Obviously, the achievements (or lack of them) of those children are
of interest to policy makers and politicians and may have particular
relevance for the nonformal education sector. Their inclusion in a
conventional national assessment is, however, difficult to envisage.
Although particular groups of out-of-school youth might be assessed
                                 DECISIONS IN A NATIONAL ASSESSMENT   | 31



using national assessment tests in a separate study, methods of
assessment and sampling procedures generally would be very differ-
ent, and the varying circumstances of such children (for example,
special needs, socioeconomic disadvantage, or distance from school)
would have to be taken into account.
   As far as school-going children are concerned, policy makers want
information about their knowledge and skills at selected points in
their educational careers. A decision has to be made about whether
populations are defined on the basis of age or grade or, indeed, by a
combination of age and grade. In countries where students vary widely
in the age at which they enter school, and in which policies of non-
promotion are in operation, students of similar age will not be con-
centrated in the same grade. In this situation, a strong argument can
be made for targeting grade level rather than age.
   The grade to be assessed should normally be dictated by the infor-
mation needs of the ministry of education. If, for example, the min-
istry is interested in finding out about the learning achievement
levels of students completing primary school, it might request that a
national assessment be carried out toward the end of the last year of
primary school (fifth or sixth grade in many countries). The ministry
could also request a national assessment in third or fourth grade if it
needed data on how students are performing midway through the
basic education cycle. This information could then be used to intro-
duce remedial measures (such as in-service courses for teachers) to
address problems with specific aspects of the curriculum identified in
the assessment.
   Target grades for national assessments have varied from country to
country. In the United States, student achievement levels are assessed
at grades 4, 8, and 12; in Colombia, achievement is assessed at grades
3, 5, 7, and 9; in Uruguay, at preschool and at grades 1, 2, and 6; and
in Sri Lanka, at grades 4, 8, and 10. In anglophone Africa, a regional
consortium of education systems, the Southern and Eastern Africa
Consortium for Monitoring Educational Quality (SACMEQ), assessed
grade 6 students. Countries in the francophone African consortium
Programme d’Analyse des Systèmes Educatifs de la CONFEMEN
(Conférence des Ministres de l’Education des Pays ayant le Français en
Partage) assessed students in grades 2 and 5.
32   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




   Sometimes pragmatic considerations dictate grade selection. The
Nigerian Federal Ministry of Education decided to assess students in
grade 4 because testing at any lower level would have required trans-
lation of tests into many local languages. More senior grades were not
considered suitable because students and teachers would be focused
on secondary-school entrance examinations.
   Relatively few countries conduct large-scale assessments in grades
1 to 3. Students at that level might not be able to follow instructions
or to cope with the cognitive tasks of the assessment or with the chal-
lenge of completing multiple-choice tests. A Jamaican study noted
that a sizable number of grade 1 students were unable to recognize
the letters of the alphabet (Lockheed and Harris 2005). Neverthe-
less, we should bear in mind that because information about early
student learning patterns may be critical to reform efforts, alternative
procedures to monitor those patterns should be in place.

Responsibility for selecting population to be assessed: Ministry of educa-
tion and NSC



WILL A WHOLE POPULATION OR A SAMPLE BE ASSESSED?

Most national and all regional and international studies use sample-
based approaches in determining national achievement levels. Some
national assessments have used both census- and sample-based ap-
proaches (for example, Costa Rica, Cuba, France, Honduras, Jordan,
Mexico, and Uruguay), whereas most subnational assessments collect
census data (for example, Minas Gerais, Parana, and São Paulo, Brazil;
Bogotá, Colombia; and Aguascalientes, Mexico) (see Crespo, Soares,
and deMello e Souza 2000). Several factors favor the use of a sample if
the objective is to obtain information for policy purposes on the func-
tioning of the education system as a whole. Those factors include
 (a) reduced costs in test administration and in cleaning and managing
data, (b) less time required for analysis and reporting, and (c) greater
accuracy because of the possibility of providing more intense supervi-
sion of fieldwork and data preparation (Ross 1987).
   As noted in chapter 3, the purpose of an assessment is key in deter-
mining whether to test a sample or the entire population of targeted
                                          DECISIONS IN A NATIONAL ASSESSMENT      | 33



TABLE 4.2
Advantages and Disadvantages of Census-Based Assessment to Hold
Schools Accountable

 Advantages                                 Disadvantages
 Focuses on what are considered             Tends to lead to neglect of subject
 important aspects of education.            areas that are not tested.

 Highlights important aspects of            Tends to lead to neglect of aspects of
 individual subjects.                       subjects that are not tested (such as
                                            oral fluency in language).

 Helps ensure that students reach an   Has contributed to early dropout and
 acceptable standard before promotion. nonpromotion.

 Allows for direct comparisons of           Leads to unfair ranking of schools
 schools.                                   where different social backgrounds are
                                            served and where results are not
                                            significantly different.

 Builds public confidence in the             Has led to cheating during test
 performance of the system.                 administration and to subsequent
                                            doctoring of results.

 Puts pressure on students to learn.        Tends to emphasize memorization and
                                            rote learning.

 Results in some schools and students       Improved performance may be limited
 raising test performance levels.           to a particular test and will not be
                                            evident on other tests of the same
                                            subject area.

 Allows parents to judge the effective-     Leads to unfair assessment of effective-
 ness of individual schools and             ness on the basis of test score perfor-
 teachers.                                  mance rather than taking into account
                                            other established factors related to
                                            learning achievement.

 Tends to be popular with politicians       Seldom holds politicians accountable
 and media.                                 for failure to support delivery of
                                            educational resources.

 Source: Authors.


students. On the one hand, the decision to involve an entire population
may reflect an intention to foster school, teacher, or even student
accountability. It facilitates the use of sanctions (incentives or penalties),
the provision of feedback to individual schools on performance, and the
publication of league tables, as well as the identification of schools with
34   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




the greatest need for assistance (for example, as in Chile and Mexico).
On the other hand, the sample-based approach will permit the detec-
tion of problems only at the system level. It will not identify specific
schools in need of support, although it can identify types or categories
of schools (for example, small rural schools) that require attention. It
can also identify problems relating to gender or ethnic equity.
   An argument against the use of a sample-based approach is that
because the assessment does not have high stakes attached to perfor-
mance, some students will not be motivated to take the test seriously.
That was not the case, however, in many countries—including South
Africa—where some students were afraid that performance on the
Trends in International Mathematics and Science Study (TIMSS)
tests would count toward their official school results. It is interesting
to note that cheating occurred during test administration, presumably
because of the perception that relatively high stakes were attached to
performance (see A.4 in appendix A).
   Advantages and disadvantages of using a national assessment to
hold schools, teachers, and sometimes students accountable are set
out in table 4.2. The topics listed are derived for the most part from
studies of the effects of high-stakes public examinations, not from a
study of national assessments. Nevertheless, they should be relevant
to census-based national assessments, at least to ones that act as sur-
rogate public examinations (as in the United States and some Latin
American countries).

Responsibility for deciding whether to use a sample or census: Ministry of
education



WHAT WILL BE ASSESSED?

All national assessments measure cognitive outcomes of instruction or
scholastic skills in the areas of language/literacy and mathematics/
numeracy, a reflection of the importance of those outcomes for basic
education. In some countries, knowledge of science and social studies
is included in an assessment. Whatever the domain of the assessment,
providing an appropriate framework is important, in the first instance
for constructing assessment instruments and afterward for interpreting
                                  DECISIONS IN A NATIONAL ASSESSMENT   | 35



results. The framework may be available in a curriculum document if,
for example, the document provides expectations for learning that are
clearly prioritized and put into operation. In most cases, however, such
a framework will not be available, and those charged with the national
assessment will have to construct it. In that task, close cooperation will
be required between the assessment agency, those responsible for cur-
ricula, and other stakeholders.
    Assessment frameworks attempt to clarify in detail what is being
assessed in a large-scale assessment, how it is being assessed, and why
it is being assessed (see Kirsch 2001). The aim of the framework is to
make the assessment process and the assumptions behind it transpar-
ent, not just for test developers but also for a much larger audience,
including teachers, curriculum personnel, and policy makers. The
framework usually starts with a general definition or statement of pur-
pose that guides the rationale for the assessment and that specifies
what should be measured in terms of knowledge, skills, and other
attributes. It then identifies and describes various performances or be-
haviors that will reveal those constructs by identifying a specific num-
ber of characteristic tasks or variables to be used in developing the
assessment, and it indicates how those performances are to be used to
assess student performance (Mullis and others 2006).
    Many national assessments have been based on a content analysis
at a particular grade level of what students are expected to have
learned as a result of exposure to a prescribed or intended curricu-
lum. Typically, this analysis is done in a matrix with cognitive behaviors
on the horizontal axis and with themes or content areas on the verti-
cal axis. Thus, the intersection of a cognitive behavior and content
area will represent a learning objective. Cells may be weighted in
terms of their importance.
    Recent national (and international) assessments have drawn on
research relating to the development in students of literary and
numeracy skills that may or may not be represented in national cur-
ricula. For example, in the International Association for the Evalua-
tion of Educational Achievement (IEA) Framework and Specifications
document for the Progress in International Reading Literacy Study
(PIRLS) 2006, reading literacy is defined as “the ability to understand
and use those written language forms required by society and/or
36   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




valued by the individual. Young readers can construct meaning from
a variety of texts. They read to learn, to participate in communities of
readers in school and everyday life, and for enjoyment” (Mullis and
others 2006, 3). From this definition it is evident that reading is much
more than decoding text or getting the meaning of a passage or poem.
PIRLS further clarified what it proposed to measure by indicating the
process and tasks to be assessed and the percentages of test items
devoted to each (table 4.3).
   The framework document specified that the assessment would use
test booklets with five literary and five informational passages and
that each passage would be followed by 12 questions, half of which

TABLE 4.3
PIRLS Reading Comprehension Processes

Comprehension
processes                    Examples of tasks                                 Items
Focus on and retrieve        Looking for specific ideas.
explicitly stated            Finding definitions or phrases
information                  Identifying the setting for a story
                             (for example, time, place).
                             Finding topic sentence or main idea
                             (explicitly stated).
                                                                               20%
Make straightforward         Inferring that one event caused another.
inferences                   Identifying generalizations in text.
                             Describing the relationship between characters.
                             Determining the referent of a pronoun.
                                                                               30%
Interpret and integrate Determining the overall message or theme.
ideas and information Contrasting text information.
                        Inferring a story’s mood or tone.
                        Interpreting a real-world application of text
                        information.                                           30%
Examine and evaluate         Evaluating the likelihood that the events
content, language,           described could happen.
and textual elements         Describing how the author devised a
                             surprise ending.
                             Judging the completeness or clarity of
                             information in text.
                             Determining the author’s perspectives.            20%
Source: Campbell and others 2001; Mullis and others 2006.
                                   DECISIONS IN A NATIONAL ASSESSMENT   | 37



would be multiple choice and half would be constructed response. It
also indicated that because reading attitudes and behaviors were
important for the development of a lifelong reading habit and were
related to reading achievement, PIRLS would include items in the
student questionnaire to assess student reading attitudes and behav-
iors. It justified its selection of students in the fourth year of formal
schooling as the target population for the assessment on the basis that
the fourth year represented the transition stage from learning to read
to reading to learn.
   In its assessment framework, PIRLS recognized two main purposes
that students have for reading:

• Reading for literacy experience
• Reading to acquire and use information.

    It also gave a detailed justification for the emphasis that PIRLS
placed on finding out more about the environment and the context
in which students learn to read. This emphasis led to the inclusion
of questionnaire items on home characteristics that can encourage
children to learn to read: literacy-related activities of parents, lan-
guage spoken in the home, links between the home and the school,
and students’ out-of-school literacy activities. School-level items
covered school resources that can directly or indirectly affect read-
ing achievement. The framework document also justified assessing
classroom variables, such as instructional approaches and the nature
of teacher training.
    A further alternative to basing an assessment instrument on
curriculum-embedded expectations or prescriptions, which is feasible
in the case of older students, is to build a test to reflect the knowledge
and skills that students are likely to need and build on in adult life. The
Programme for International Student Assessment (PISA) provided an
example of this method when it set out to assess the “mathematical
literacy” of 15-year-olds, defined as the “capacity to identify and under-
stand the role that mathematics plays in the world, to make well-
founded judgements and to use and engage with mathematics in works
that meet the needs of the individual’s life as a constructive, concerned,
and reflective citizen” (OECD 2003, 24) (see B.3 in appendix B).
Although this approach fitted well in an international study, given that
38   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




the alternative of devising an assessment instrument that would be
equally appropriate to a variety of curricula is obviously problematic, it
might also be used in a national assessment.
   A few national assessments have collected information on affec-
tive outcomes (for example, student attitudes to school and student
self-esteem). In Colombia, for example, students’ attitudes to peace
are assessed. Although those outcomes are very important, their
measurement tends to be less reliable than the measurement of cog-
nitive outcomes, and analyses based on them have proved difficult
to interpret. In Chile, technical difficulties in measuring student
values and attitudes to learning led to abandoning those areas (see
A.7 in appendix A).
   One large-scale assessment (Monitoring Learning Achievement)
assessed “life skills,” defined as students’ knowledge of, and attitudes
toward, health and nutrition, environment, civic responsibility, and
science and technology (Chinapah 1997). While it is generally
accepted that life skills are important and should be taught, there is
considerable disagreement about their precise nature. Their measure-
ment has also proven difficult.
   Most national assessments collect information on student, school,
and home factors that are considered relevant to student achievement
(for example, student gender and educational history, including grade
repetition; resources in schools, including the availability of textbooks;
level of teacher education and qualifications; and socioeconomic
status of students’ families). The information is normally collected in
questionnaires (and sometimes in interviews) administered to stu-
dents, to teachers, to principal teachers, and sometimes to parents at
the same time as the assessment instruments are administered.
   Identification of contextual factors related to student achievement
can help identify manipulable variables, that is, factors that can be
altered by policy makers, such as regulations about the time allocated
to curriculum areas, textbook provision, and class size. The contextual
data collected in some national (and international) studies, however,
cannot play this role because they do not adequately measure the con-
ditions in which students live. Economic status, for example, may be
based on a list of items that includes a car, a television set, and a water
tap in a country where the majority of the population lives at least part
                                  DECISIONS IN A NATIONAL ASSESSMENT   | 39



of the year on less than the equivalent of US$1 a day. Furthermore,
despite the relevance of health status and nutritional status, no infor-
mation may be obtained about them (Naumann 2005).
   In some assessments, teachers’ (as well as pupils’) achievements
have been assessed. In Vietnam (see A.2 in appendix A) and a number
of African countries in the SACMEQ studies (see C.1 in appendix C),
teachers were required to take the same test items as their students to
gain some insight into teachers’ levels of subject mastery. In Uganda,
information was obtained on the extent to which teachers claimed to
be familiar with key official curriculum documents.

Responsibility for deciding what will be assessed: Ministry of education,
NSC, with input from implementation agency.



HOW WILL ACHIEVEMENT BE ASSESSED?

An instrument or instruments must be devised that will provide the
information that the national assessment is meant to obtain. Because
the purposes and proposed uses of national assessments vary, so too
will the instruments used in the assessments and the ways results
are reported.
   Some national assessments present results in terms of the charac-
teristics of the distribution of test scores—for example, the mean
percentage of items that students answered correctly and the way
scores were distributed around the mean. Or results might be scaled
to an arbitrary mean (such as 500) and standard deviation (such as
100). Although these scores can be used to compare the perfor-
mance of subgroups in the sample, they are limited in their use in a
national assessment, primarily because they tell us little about stu-
dents’ level of subject matter knowledge or the actual skills that
students have acquired.
   To address this issue, and to make the results of an assessment more
meaningful for stakeholders, an increasing number of national assess-
ments seek to report results in a way that specifies what students know
and do not know and that identifies strengths and weaknesses in their
knowledge and skills. This approach involves matching student scores
with descriptions of the tasks they are able to do (for example, “can
40   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




read at a specified level of comprehension” or “can carry out basic
mathematical operations”). Performances may be categorized in vari-
ous ways (for example, “satisfactory” or “unsatisfactory”; “basic,” “pro-
ficient,” or “advanced”), and the proportion of students achieving at
each level determined. Matching student scores to performance levels
is a complex task involving the judgment of curriculum experts and
statistical analysts.
   The way in which results will be described should be a consider-
ation at the test development stage. Thus, test development might
begin with specification of a framework in which expectations for
learning are posited, following which test items are written to assess
the extent to which students meet those expectations. If items do not
meet certain criteria when tried out, however, including the extent to
which they discriminate between students, they may not be included
in the final assessment instrument. Care should be taken to ensure
that important curriculum objectives are reflected in an assessment,
even if no students in the trial provide evidence of achieving them.
   Most national and international assessments rely to a consider-
able extent on the multiple-choice format in their instruments.
Those items will often be supplemented by open-ended items that
require the student to write a word, phrase, or sentence. Examples
of multiple-choice and open-ended items are provided in box 4.2
and box 4.3, respectively.
   In several national (for example, the U.S. NAEP and Ireland’s
National Assessment of English Reading) and international assessments
(for example, TIMSS and PISA), each student responds to only a frac-
tion of the total number of items used in an assessment (see A.8 in
appendix A; B.1 and B.3 in appendix B). This approach increases
overall test coverage of the curriculum without placing too great a
burden on individual students. It also allows the use of extended
passages (for example, a short story or a newspaper article) in the
assessment of reading comprehension. In other assessments, all stu-
dents respond to the same set of items. Although some advantages are
associated with having individual students respond to only a fraction
of items, disadvantages also exist, particularly for countries beginning
a national assessment program. Administration (for example, print-
ing and distribution) is more complex, as is scoring and scaling of
                                         DECISIONS IN A NATIONAL ASSESSMENT     | 41




BOX 4.2


 Examples of Multiple-Choice Items

 Subject: Geography

 The river Volga is in

 A. China
 B. Germany

 C. Russia

 D. Sweden.

 Subject: Mathematics

 A seal has to breathe if it is asleep. Martin observed a seal for one hour. At the
 start of this observation, the seal dived to the bottom of the sea and started to
 sleep. In eight minutes, it slowly floated to the surface and took a breath. In
 three minutes, it was back at the bottom of the sea again, and the whole
 process started over in a very regular way. After one hour, the seal was

 A. at the bottom

 B. on its way up

 C. breathing

 D. on its way down.
 Source: Mathematics example: OECD 2007. Reproduced with permission.




BOX 4.3


 Examples of Open-Ended Items
 Subject: Language

                         SMALL
 TALL is the opposite of SMALL.

 What is the opposite of

                   DARK
 QUICK __________ DARK  __________
                       __________

                   OLD
 HEAVY __________ OLD  __________
                      __________
 Subject: Mathematics

 Use your ruler to draw a rectangle with a perimeter of 20 centimeters. Label
     width
 thethe    and
        width   thethe
              and      length.
                    length.
42   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




scores, while analyses involving individual student or school data can
be problematic (see Sofroniou and Kellaghan 2004).
   The issue of language of assessment is generally accorded less
attention than it deserves. It is associated with two problems. First,
although in many countries large minority (and sometimes majority)
groups are present for whom the language of instruction is not their
mother tongue, students are usually assessed in the language of
instruction. In Uganda, for example, the vast majority of students take
tests in their second language (see A.9 in appendix A). Poor perfor-
mance on tests is attributed to this practice, as are the generally poor
scholastic progress of students and early dropout rates from school
(Naumann 2005).
   A second problem relating to language arises if the instruments of
the assessment need to be translated into one or more languages. If
comparisons are to be made between performances assessed in differ-
ent languages, analysis must take into account the possibility that
differences that may emerge may be attributable to language-related
differences in the difficulty of assessment tasks. The issue is partly
addressed by changing words. For example, in an international assess-
ment carried out in South Africa, words such as “gasoline” (“petrol”)
and “flashlight” (“torch”) were changed. Ghana replaced the word
“snow” with “rain.” If language differences co-vary with cultural and
economic factors, the problem is compounded because it may be dif-
ficult to ensure the equivalence of the way questions are phrased and
the cultural appropriateness of content in all language versions of a test.
For example, material that is context-appropriate for students in rural
areas—covering hunting, the local marketplace, agricultural pursuits,
and local games—might be unfamiliar to students in urban areas.
   Whatever the details of the method of assessment, the assessment
needs to provide valid and reliable information. Validity has several
facets, including the adequacy of an assessment instrument to sample
and represent the construct (for example, reading literacy) or the cur-
riculum area (for example, social studies) identified in the assessment
framework. The judgment of curriculum specialists is important here.
Furthermore, the assessment instrument should measure only what it
is designed to measure. For example, a test of mathematics or science
should assess students’ knowledge and skills in those areas, not their
                                   DECISIONS IN A NATIONAL ASSESSMENT   | 43



competence in language. The reliability of assessment procedures in
national assessments usually involves estimating the extent to which
individual items in a test assess the overall construct the test is
designed to measure and, in the case of open-ended items, the extent
to which two or more markers agree in their scoring.

Responsibility for deciding how achievement will be assessed: Implementa-
tion agency.



HOW FREQUENTLY WILL ASSESSMENTS BE CARRIED OUT?


The frequency with which a national assessment is carried out varies
from country to country, ranging from every year to every 10 years. A
temptation may exist to assess achievement in the same curriculum
areas and in the same population every year, but this frequency is un-
necessary, as well as very expensive, if the aim is to monitor national
standards. In the United States, reading and mathematics are assessed
every second year and other subjects less frequently. The international
assessment of reading literacy (PIRLS) had a five-year span between
the first and second administration (2001–06). In Japan, achievement
in core curriculum areas was assessed every 10 years to guide curricu-
lum and textbook revision (Ishino 1995).
   If the aim of an assessment is to hold teachers, schools, and even
students accountable for their learning, testing may be carried out
every year. Furthermore, because such an assessment focuses on the
performance of individuals, as well as performance at the system level,
all (or most) students in the education system will be assessed. This
system has been operated in Chile and in England.
   If the purpose of an assessment is only to provide information on the
performance of the system as a whole, however, an assessment of a
sample of students in a particular curriculum area every three to five
years would seem adequate. Because education systems do not change
rapidly, more frequent assessments would be unlikely to register
change. Overfrequent assessments would more than likely limit the
impact of the results, as well as incur unnecessary costs.

Responsibility for deciding frequency of assessment: Ministry of education
44   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




HOW SHOULD STUDENT ACHIEVEMENT BE REPORTED?

Although policy makers probably prefer summary statistics, evi-
dence on the multidimensionality of achievement suggests that a
single index of performance, such as a total test score, may obscure
important information. An alternative approach is to provide differ-
entiated information that reflects strengths and weaknesses in a
country’s curriculum. The information would be even more valuable
if it distinguished between students’ knowledge of basic facts and
skills and their deeper or higher-order understanding.
   A variety of procedures have been used to describe student
achievements in national assessments, which reflect the richness of
the data that an assessment can provide (see book 5 in this series,
Reporting and Using Results from a National Assessment of Educa-
tional Achievement). The selection of one or more procedures should
be guided by the information needs of the ministry of education and
other stakeholders.


Item-Level Information

This information involves little more than simply reporting the per-
centage of students answering individual items correctly. A national
assessment might reveal that the majority of its students performed
poorly on a mathematics item involving the use of indices, or that
virtually all students were able to associate simple words with pictures.
In Ghana, for example, only 1 percent of students correctly answered
a question on light refraction in TIMSS (Ghana, Ministry of Educa-
tion, Youth, and Sports 2004). This kind of information, while too
detailed for national policy making, is likely to be of interest to cur-
riculum personnel, teacher trainers, and possibly textbook authors.


Performance in Curriculum Domains

Items can be grouped into curriculum units or domains, and test
scores can be reported in terms of performance in each domain.
Reading items, for example, have been classified by ability to retrieve
information from a text, to make inferences from a text, to interpret
                                                DECISIONS IN A NATIONAL ASSESSMENT          | 45




  FIGURE 4.1

  Mean Percentage Correct Scores for Students’ Mathematics Performance, by
  Content Area, Lesotho
                           80

                           70

                           60
    student facility (%)




                           50

                           40

                           30

                           20

                           10

                            0
                                number   measurement        shape               data
                                                                           representation
                                                content area

Source: Lesotho, Examinations Council of Lesotho and National Curriculum Development Centre 2006.


and integrate information, and to examine and evaluate text infor-
mation (Eivers and others 2005). Figure 4.1 illustrates how Lesotho
reported mathematics performance by content area.


Performance Standards

Performance on national and international assessments describes how
well students perform on the test to achieve a “basic,” “proficient,” or
“advanced” level in a curriculum area. The number of levels may vary
(see A.2 in appendix A for a description of six levels of reading profi-
ciency used in a national assessment in Vietnam, and see C.1 in
appendix C for eight reading levels and eight mathematics skill levels
used in SACMEQ). The selection of cutoff points between levels
involves the use of statistical data and subjective judgment.


Mastery Standard

Mastery levels can be based on an overall test score (for example, cor-
rectly answering a specified percentage of test items). In Sri Lanka,
46   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




the mastery level for a grade 4 national assessment was set at 80 per-
cent. Fewer than 40 percent achieved that level in the students’ first
language or in mathematics, and fewer than 10 percent in English
(Perera and others 2004). Mastery levels can also be based on achieving
a certain performance level. In the United States, five levels of perfor-
mance (“below basic,” “basic,” “proficient,” “goal,” and “advanced”) are
used in Connecticut. The “goal” level is regarded as a challenging but
reasonable level of expectation for students and is accepted as the mas-
tery level. The data in table 4.4 show that well over half the students
in grades 3 and 4 achieved the “goal” or “mastery” level in all three
curriculum areas.

Responsibility for deciding how student achievement is reported: Imple-
mentation agency with input from NSC



WHAT KINDS OF STATISTICAL ANALYSES SHOULD
BE CARRIED OUT?

Some analyses will be dictated by the policy questions that prompted
the assessment in the first instance. Most national assessments provide
evidence on achievement by gender, region, urban or rural location,
ethnic or language group membership, and type of institution
attended (public or private). Some assessments also provide data on
the quality of school facilities (for example, Kenya). Analyses
involving those variables are relatively straightforward and are
intuitively meaningful to policy makers and politicians. They do
not, however, adequately represent the complexity of the data.
More complex forms of analysis are required if we are, for example,
to throw light on the school and background factors that contribute
to achievement. Examples of the use of complex statistical proce-
dures are found in the description of the Vietnamese national assess-
ment (see A.2 in appendix A).
   The limitations of analyses and problems in inferring causation from
studies in which data are collected at the same time on achievement
and other variables should be recognized. Although it is difficult, some-
times impossible, to disentangle the effects of community, home, and
school factors on students’ learning, this complexity has not deterred
TABLE 4.4
Percentage Achieving Goal or Mastery Level by Grade, Connecticut, 2006

                                  Mathematics                            Reading                               Writing




                                                                                                                                       DECISIONS IN A NATIONAL ASSESSMENT
                      At or above            At or above                           At or above   At or above             At or above
                          goal                advanced     At or above goal         advanced         goal                 advanced
     Grade                (%)                    (%)              (%)                  (%)           (%)                     (%)
        3                   56                       22          54                    17            61                      22
        4                   59                       22          58                    16            63                      22
 Source: Connecticut Department of Education 2006.




                                                                                                                                       | 47
48   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




some investigations from causally interpreting data collected in national
and international assessments.

Responsibility for deciding on methods of statistical analysis: Implemen-
tation agency.



HOW SHOULD THE RESULTS OF A NATIONAL ASSESSMENT
BE COMMUNICATED AND USED?

If the results of a national assessment are to affect national education
policy, they should be reported as soon as possible after the comple-
tion of data analysis. In the past, technical reports that featured a
considerable amount of data tended to be the sole form of reporting.
Some groups of users (for example, teachers in Chile; see A.7 in
appendix A), however, considered those reports overtechnical. As a
result, the requirement to provide other forms of reports is now
increasingly recognized. Those alternatives include short summary
reports that focus on the main findings for busy policy makers; press
releases; special reports for radio and television; and separate reports
for schools, teachers, curriculum developers, and teacher trainers. In
some countries (for example, Sri Lanka), separate reports are pre-
pared for each province. A report in Ethiopia was translated into
four major languages. The information needs of stakeholders should
determine the contents of additional reports.
    The ministry of education should make adequate budgetary provi-
sion at the planning stage for report preparation and dissemination. In
collaboration with the national steering committee, it should devise
procedures to communicate the findings of national assessments to
stakeholders. Appropriate strategies to communicate results should take
into account the fact that users (whether administrators or teachers)
vary greatly in their ability to understand and apply statistical infor-
mation in their decision making. Obviously, there is no point in
producing reports if the information they contain is not adequately
disseminated. Thus, a dissemination strategy is also required so that
relevant information reaches all stakeholders. The strategy should
identify potential users (key institutions and individuals) and their
levels of technical expertise.
                                 DECISIONS IN A NATIONAL ASSESSMENT   | 49



   National assessment results have been used to set benchmarks for
monitoring learning achievement levels (for example, in Lesotho),
reforming curricula, providing baseline data on the amount and
quality of educational materials in schools (for example, in Vietnam),
identifying correlates of achievement, and diagnosing aspects of the
curriculum that are not being mastered by students. Uruguay, for
instance, used its national assessment results to help prepare teacher
guides and to identify the curriculum content and behavioral areas
that subsequently helped direct a large-scale teacher in-service pro-
gram (see A.3 in appendix A).
   Book 5 in this series, Reporting and Using Results from a National
Assessment of Educational Achievement, has an extensive section on
report writing and the use of national assessment results.

Responsibility for communicating and using national assessment results:
Implementation agency, ministry of education, NSC, teacher training
providers, curriculum authority, teachers.


WHAT ARE THE COST COMPONENTS OF
A NATIONAL ASSESSMENT?

The cost of a national assessment will vary greatly from one country
to another, depending on the salary levels of personnel and the cost
of services. Within a country, cost will also vary, depending on some
or all of the following factors (Ilon 1996).

• Implementing agency. Costs will vary depending on whether the
  agency has the necessary facilities and expertise or needs to up-
  grade or employ full-time or part-time consultants. The cost of
  providing facilities and equipment, including computers and soft-
  ware, also needs to be taken into account.

• Instrument content and construction. Options for the selection of the
  content and form of assessment should be considered in terms of
  cost, as well as other factors, such as validity and ease of adminis-
  tration. Multiple-choice items are more expensive to construct
  than open-ended items but are usually less expensive to score. The
  cost of translating tests, questionnaires, and manuals and of training
  item writers also needs to be considered.
50   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




• Numbers of participating schools and students. A census-based
  assessment will obviously be more expensive than a sample-based
  one. Costs increase if reliable data are required for sectors of the sys-
  tem (for example, states or provinces). Targeting an age level is likely
  to be more expensive than targeting a grade level because students of
  any particular age may be spread over a number of grades, requiring
  additional assessment material and testing sessions.

• Administration. Data collection tends to be the most expensive
  component of a national assessment. It involves obtaining informa-
  tion from schools in advance of the assessment; designing, printing,
  packaging, and dispatching test materials and questionnaires; and
  establishing a system to administer instruments. Factors that con-
  tribute to overall cost include (a) the number of schools and stu-
  dents that participate, (b) travel, (c) difficulty in gaining access to
  schools, (d) accommodation for enumerators (if needed), and (e)
  the collection and return of completed tests and questionnaires.

• Scoring, data management, and data entry. Costs will vary according
  to the number of participating schools, students, teachers, and par-
  ents; the number of open-ended items; whether items are hand or
  machine scored; the number of inter-rater reliability studies; and
  the quality of test administration and scoring.

• Analysis. Analytic costs will depend on the type of assessment pro-
  cedures used and the availability of technology for scoring and analy-
  sis. Although machine scoring is normally considered to be cheaper
  than hand scoring, this reduced cost may not be the case in a coun-
  try where technology costs are high and labor costs are low.

• Reporting. Costing should take account of the fact that different
  versions of a report will be required for policy makers, teachers,
  and the general public and of the nature and extent of the report
  dissemination strategy.

• Follow-up activities. Budgetary provision may have to be made for
  activities such as in-service teacher training that is based on the find-
  ings of the national assessment, briefings for curriculum bodies, and
  secondary analyses of the data. Provision may also have to be made
  to address skill shortages in key professional areas (for example,
                                 DECISIONS IN A NATIONAL ASSESSMENT   | 51



  statistical analysis). Budgetary provision should be made for likely
  salary increases over the life of the assessment (normally two to three
  years), for inflation, and for unexpected events (contingencies).

   Some national assessments have not achieved their basic objectives
because the budget was inadequate. Although the overall budget is
the responsibility of the ministry of education, people with expertise
in costing and with large-scale data projects should participate in the
budgetary discussions. Ministry officials who are unfamiliar with
large-scale data projects are unlikely to appreciate the need to budget
for activities such as pilot-testing and data cleaning.
   Figures for the U.S. NAEP provide a rough guide to costing: data
collection (30 percent), instrument development (15 percent), data
analysis (15 percent), reporting and dissemination (15 percent),
sampling (10 percent), data processing (10 percent), and governance
(5 percent) (Ilon 1996). In some countries, where, for example,
ministry or examination board officials carry out test administration
as part of their normal duties, separate budgetary provision may not
be made for some activities. Costs and wages will vary depending on
national economic conditions. In Cambodia (which is ranked outside
the top 100 countries in the world in terms of gross national income),
item writers were paid the equivalent of US$5 a day in 2006.
   Countries with very limited resources may not find expending those
resources on a national assessment advisable, especially when their edu-
cation system is likely to have many unmet needs. If they do wish to
engage in national assessment activity, they would be well advised to
limit the number of curriculum areas assessed (perhaps to one, at one
grade level) and to seek technical assistance and the support of donors.
   In considering costs, it is well to bear in mind that the cost of ac-
countability programs in general—and of national assessments in
particular—is very small compared to the cost of other educational
programs (see Hoxby 2002). The cost of not carrying out an
assessment—of not finding out what is working and what is not work-
ing in the education system—is likely to be much greater than the
cost of an assessment. Book 3 of this series, Implementing a National
Assessment of Educational Achievement, discusses issues relating to
costing a national assessment.
52   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Responsibility for estimating the component costs of a national assessment:
Ministry of education with consultant input.


SUMMARY OF DECISIONS

Table 4.5 identifies the agencies with primary responsibility for deci-
sions relating to the 12 components of a national assessment that are
discussed in this chapter.
TABLE 4.5
Bodies with Primary Responsibility for Decisions in a National Assessment

                                             Primary responsibility

                                                National
                               Ministry of      Steering
 Decision                      education       Committee       Agency   Other

 Give policy guidance               •

 Carry out national
                                                                  •
 assessment
 Administer tests and
 questionnaires                                                   •

 Choose population to be
                                    •               •
 assessed

 Determine sample or
                                    •
 population

 Decide what to assess              •               •             •

 Decide how achievement
                                                                  •
 is assessed

 Determine frequency of
                                    •
 assessment
 Select methods of
                                                    •             •
 reporting

 Determine statistical
                                                                  •
 procedures
 Identify methods of
 communicating and                  •               •             •       •
 using results
 Estimate cost components           •                                     •
 Source: Authors.
CHAPTER

        5               ISSUES IN THE DESIGN,
                        IMPLEMENTATION,
                        ANALYSIS, REPORTING,
                        AND USE OF A
                        NATIONAL ASSESSMENT


                         In this chapter, we identify a number of issues that
     are relevant to the confidence that stakeholders can have in the results
     of a national assessment. For five components of national assessment
     activity (design, implementation, data analysis, report writing, and
     dissemination and use of findings), we suggest a number of activities
     that will enhance confidence, which, in turn, should contribute to the
     optimum use of findings. For each component, we also identify
     common errors that have been made in national assessments and that
     evaluators should be aware of and should avoid.



     DESIGN

     The design of the assessment sets out the broad parameters of the
     exercise: the achievements to be assessed, the grade or age level at
     which students will be assessed, the policy issues to be addressed, and
     whether the assessment will involve the whole population or a sample
     of students.




                                                                          53
54   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Recommended Activities

• Involve senior policy makers from the outset to ensure political
  support and to help frame the assessment design.
• Determine and address the information needs of policy makers
  when selecting aspects of the curriculum, grade levels, and
  population subgroups (for example, by region or by gender) to be
  assessed.
• Obtain teacher support by involving teacher representatives in
  assessment-related policy decisions.
• Be aware that attaching high stakes to students’ performance may
  lead to teacher opposition and to a narrowing of the effective
  curriculum as teachers focus their teaching on what is assessed.



Common Errors

• Failure to make adequate financial provision for key aspects of a
  national assessment, including report writing and dissemination.
• Failure to set up a national steering committee and to use it as a
  source of information and guidance during the course of the national
  assessment.
• Failure to gain government commitment to the process of national
  assessment, which is reflected in (a) a failure to identify key policy
  issues to be addressed at the design stage of the assessment, (b) the
  absence of a national steering committee, or (c) separate national
  assessments being carried out at the same time (often supported by
  external donors).
• Failure to involve key stakeholders (for example, teachers’ repre-
  sentatives or teacher trainers) in planning the national assessment.
• Omission of a subgroup from the population assessed that is likely
  to seriously bias the results of the assessment (for example, students
  in private schools or students in small schools).
• Setting unrealistic test score targets (for example, 25 percent
  increase in scores over a four-year period).
• Allowing inadequate time for test development.
       ISSUES IN THE DESIGN, IMPLEMENTATION, ANALYSIS, REPORTING, AND USE   | 55



IMPLEMENTATION

Implementation covers a vast range of activities, from the develop-
ment of appropriate assessment instruments, to the selection of the
students who will respond to the instruments, to the administration
of the instruments in schools.


Recommended Activities

• Describe in detail the content and cognitive skills of achievement
  and the background variables to be assessed.
• Entrust test development to personnel who are familiar with both
  curriculum standards and learning levels of students (especially
  practicing teachers).
• Use assessment instruments that adequately assess the knowledge
  and skills about which information is required and that will provide
  information on subdomains of knowledge or skills (for example,
  problem solving) rather than just an overall score.
• Develop clear and unambiguous test and questionnaire items, and
  present them in a clear and attractive manner.
• Ensure that adequate procedures are in place to assess the equiva-
  lence of language versions if translation of instruments is necessary.
• Pilot-test items, questionnaires, and manuals.
• Review items to identify ambiguities and possible bias relating to
  student characteristics (for example, gender, location, or ethnic
  group membership), and revise or delete if necessary.
• Proofread all materials carefully.
• Establish procedures to ensure the security of all national assess-
  ment materials (for example, tests and questionnaires) throughout
  the whole assessment process, so that materials do not fall into the
  hands of unauthorized people.
• Secure the services of a person or unit with sampling expertise.
• Specify the defined target population (the population from which
  a sample will actually be drawn—that is, the sampling frame) and
  the excluded population (for example, elements of the population
56    |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




     that are too difficult to reach or that would not be able to respond
     to the instrument). Precise data on excluded populations should
     be provided.
•    Ensure that the proposed sample is representative and is of sufficient
     size to provide information on populations of interest with an
     acceptable level of error.
•    Select members of the sample from the sampling frame according
     to known probabilities of selection.
•    Follow a standard procedure when administering tests and ques-
     tionnaires. Prepare an administration manual.
•    Ensure that test administrators are thoroughly familiar with the con-
     tents of tests, questionnaires, and manuals and with administrative
     procedures.
•    Prepare and implement a quality assurance mechanism to cover,
     among other things, test validity, sampling, printing, test adminis-
     tration, and data preparation.


Common Errors

• Assigning test development tasks to people who are unfamiliar
  with the likely levels of student performance (for example, aca-
  demics), resulting in tests that are too difficult.
• Representing curriculum inadequately in tests, as indicated, for
  example, in failure to include important aspects of the curriculum.
• Failing to pilot-test items or pilot-testing on an unrepresentative
  sample of the population.
• Using an insufficient number of test items in the final version of
  the test.
• Failing to give a clear definition of the construct being assessed (for
  example, reading).
• Including an insufficient number of sample items for students who
  are unfamiliar with the testing format.
• Not encouraging students to seek clarification from the test super-
  visor before taking the test.
• Failing to give adequate notification to printers of tests, question-
  naires, and manuals.
• Paying insufficient attention to proofreading tests, questionnaires,
  and administration manuals.
       ISSUES IN THE DESIGN, IMPLEMENTATION, ANALYSIS, REPORTING, AND USE   | 57



• Using inadequate or out-of-date national data on pupils and school
  numbers for sampling.
• Failing to carry out proper sampling procedures, including selecting
  a predetermined percentage of schools (for example, 5 percent).
• Providing inadequate training to test and questionnaire adminis-
  trators.
• Allowing outside intervention (for example, principal sitting in the
  classroom) during test administration.
• Allowing students to sit close to each other during the assessment
  (encourages copying).
• Failing to establish a tradition of working outside normal work
  hours, if needed, to complete key tasks on time.



ANALYSIS

Statistical analyses organize, summarize, and interpret the data collected
in schools. They should address the policy issues identified in the
design of the national assessment.


Recommended Activities

• Secure competent statistical services.
• Prepare a codebook with specific directions for preparing data for
  analysis.
• Check and clean data to remove errors (for example, relating to
  numbers, out-of-range scores, and mismatches between data
  collected at different levels).
• Calculate sampling errors, taking into account complexities in the
  sample, such as stratification and clustering.
• Weight data so that the contribution of the various sectors of the
  sample to aggregate achievement scores reflects their proportions
  in the target population.
• Identify the percentage of students who met defined acceptable
  levels or standards.
• Analyze assessment data to identify factors that might account
  for variation in student achievement levels to help inform policy
  making.
58   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




• Analyze results by curriculum domain. Provide information on the
  subdomains of a curriculum area (for example, aspects of reading,
  mathematics).
• Recognize that a variety of measurement, curricular, and social
  factors may account for student performance.


Common Errors

• Using inappropriate statistical analyses, including failing to weight
  sample data in the analysis.
• Basing results on small numbers (for example, a minority of sam-
  pled teachers who might have responded to a particular question).
• Contrasting student performance in different curriculum areas,
  and claiming that students are doing better in one area on the basis
  of mean score differences.
• Failing to emphasize the arbitrary nature of selected test score
  cutoff points (for example, mastery versus nonmastery, pass versus
  fail), dichotomizing results, and failing to recognize the wide range
  of test scores in a group.
• Not reporting standard errors associated with individual statistics.
• Computing and publicizing school rankings on the basis of achieve-
  ment test results without taking into account key contextual
  factors that contribute to the ranking. Different rankings emerge
  when school performances are compared using unadjusted perfor-
  mance scores, scores adjusted for contextual factors (for example,
  the percentage of students from poor socioeconomic backgrounds),
  and scores adjusted for earlier achievement.
• Inferring causation where it might not be justified (for example,
  attributing differences in learning achievement to one variable,
  such as private school administration or class size).
• Comparing test results over two time periods even though non-
  equivalent test items were used.
• Comparing test results over two time periods without reporting the
  extent to which important background conditions (for example,
  curriculum, enrollment, household income, or level of civil strife)
  might have changed in the interim. Although most education-related
  variables tend not to change rapidly over a short time (for example,
       ISSUES IN THE DESIGN, IMPLEMENTATION, ANALYSIS, REPORTING, AND USE   | 59



  three to four years), some countries have introduced policies that
  have resulted in major changes in enrollment. Following the aboli-
  tion of school fees, for example, the number of students enrolling in
  schools increased greatly in Malawi and Uganda.
• Limiting analysis in the main to a listing of mean scores of geographical
  or administrative regions.



REPORT WRITING

There is little point in carrying out a national assessment unless the
findings are clearly reported with the needs of various stakeholders
in mind.


Recommended Activities

• Prepare reports in a timely manner with the needs of clients
  in mind, and present them in a format that is readily understood
  by interested parties, especially those in a position to make
  decisions.
• Report results by gender and region, if sample design permits.
• Provide adequate information in the report or in a technical man-
  ual to allow for replication of the assessment.


Common Errors

• Writing overly technical reports.
• Failing to highlight a few main findings.
• Making recommendations in relation to a specific variable even though
  the analysis questioned the validity of the data on that variable.
• Failing to relate assessment results to curriculum, textbook, and
  teacher training issues.
• Not acknowledging that factors outside the control of the teacher
  and the school contribute to test score performance.
• Failing to recognize that differences between mean scores may not
  be statistically significant.
• Producing the report too late to influence relevant policy decisions.
60   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




• Doing an overextensive review of literature in the assessment
  report.
• Failing to publicize the key relevant messages of the report for sepa-
  rate stakeholder audiences.



DISSEMINATION AND USE OF FINDINGS

It is important that the results of national assessments are not left on
policy makers’ shelves but are communicated in appropriate language
to all who can affect the quality of students’ learning.


Recommended Activities

• Provide results to stakeholders, especially key policy makers and
  managers.
• Use the results where appropriate for policy making and to improve
  teaching and curricula.


Common Errors

• Ignoring the results when it comes to policy making.
• Among key stakeholders (for example, teacher trainers or curricu-
  lum personnel), failing to consider the implications of the national
  assessment findings.
• Among the national assessment team, failing to reflect on lessons
  learned and to take note of those lessons in follow-up assessments.
CHAPTER

        6               INTERNATIONAL
                        ASSESSMENTS
                        OF STUDENT
                        ACHIEVEMENT


                         In this chapter, we describe international assess-
     ments of students’ educational achievement because they are used in
     many countries to provide data for a national assessment. First, we
     outline the main features of international assessments in terms of
     how they are similar to and differ from national assessments. Next,
     we describe growth in international assessment activity. Then the
     chapter identifies advantages of international assessments as well as
     problems associated with these assessments.
        An international assessment of student achievement is similar in
     many ways to a national assessment. Both exercises use similar
     procedures (in instrument construction, sampling, scoring, and analy-
     sis). They also may have similar purposes: (a) to determine how well
     students are learning in the education system; (b) to identify particular
     strengths and weaknesses in the knowledge and skills that students
     have acquired; (c) to compare the achievements of subgroups in the
     population (for example, defined in terms of gender or location); or
     (d) to determine the relationship between student achievement and
     a variety of characteristics of the school learning environment and
     of homes and communities. Furthermore, both exercises may
     attempt to establish whether student achievements change over



                                                                           61
62   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




time (Kellaghan and Greaney 2004). In practice, however, why a
country decides to participate in an international assessment is not
always clear (Ferrer 2006).
   The main advantage of an international assessment compared to a
national assessment is that the former has as an objective to provide
policy makers, educators, and the general public with information about
their education system in relation to one or more other systems (Beaton
and others 1999; Husén 1973; Postlethwaite 2004). This information is
assumed to put pressure on policy makers and politicians to improve
services. Furthermore, it is hoped that the information will contribute
to a greater understanding of the factors (that vary from country to
country) that contribute to differences in student achievement.
   The curriculum areas that have attracted the largest participation
rates in international studies over the years are reading comprehension,
mathematics, and science. Studies have been carried out at primary-
and secondary-school levels. Usually, a combination of grade and age
is used to determine who will participate (for example, students in
two adjacent grades that contain the largest proportions of 9-year-
olds and 13-year-olds; students in the grade levels containing most
9-year-olds and most 14-year-olds; the upper of two adjacent grades
with the most 9-year-olds). In yet another international study, students
of a particular age were selected (15-year-olds).
   The results of international assessments such as the Trends in
International Mathematics and Science Study (TIMSS) and the Pro-
gramme for International Student Assessment (PISA) and regional
assessments can and have been used to prepare separate national
reports on country-level performance. International databases can be
accessed to carry out such analyses.
   Countries vary considerably in the extent to which they rely on
international and national assessment results for policy making. Many
industrial countries conduct their own national assessments, as well as
participating in international assessments. The United States has its
own National Assessment of Educational Progress for grades 4, 8, and
12; it also participates in international assessments of achievement.
Some industrial countries have participated in international assess-
ments but do not conduct national assessments (for example, the
Russian Federation and Germany). Similarly, some developing countries
                   INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT   | 63



have used international assessments to provide their sole form of
national assessment (Braun and Kanjee 2007). Many of the world’s
poorest countries have not taken part in international assessments or
carried out national assessments, although the situation has changed in
recent years.



GROWTH IN INTERNATIONAL ASSESSMENT ACTIVITY

International assessment activity began when a group of researchers
met in 1958 to consider the possibility of undertaking a study of mea-
sured outcomes and their determinants within and between systems
of education (Husén and Postlethwaite 1996). Since then, more than
60 countries have participated in international studies of achievement
in one or more of a variety of curriculum areas: reading, mathematics,
science, writing, literature, foreign languages, civic education, and com-
puter literacy. The best-known international assessments are TIMSS
(see B.1 in appendix B) and the Progress in International Reading
Literacy Study (PIRLS) (see B.2 in appendix B) of the International
Association for the Evaluation of Educational Achievement (IEA)
and PISA (see B.3 in appendix B) of the Organisation for Economic
Co-operation and Development (OECD). Regional assessments in
reading and mathematics have been carried out in southern and east-
ern Africa (see C.1 in appendix C), in francophone Africa (see C.2 in
appendix C), and in Latin America (see C.3 in appendix C). A num-
ber of features on which TIMSS and PISA differ are presented in table
6.1 (see also B.1 and B.3 in appendix B).
   The number of countries participating in international studies has
increased over the years. While typically fewer than 20 countries par-
ticipated up to the 1980s, the IEA Reading Literacy Study attracted
32 countries in 1991. In 2003, 52 countries participated in TIMSS
and 41 in PISA (30 member states of the OECD and 11 “partner”
countries). Furthermore, international studies in recent years have
accorded a major focus to monitoring performance over time. All
three major current international assessments (TIMSS, PIRLS, and
PISA) are administered on a cyclical basis and are now described as
“trend” studies.
                                                                                                                                            64
TABLE 6.1
Comparison of TIMSS and PISA




                                                                                                                                           |
                                                                                                                                           ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION
                               TIMSS 2003                                    PISA 2003
Purposes                       To provide comparative evidence on the        To provide comparative evidence on the “yield” of the
                               extent to which students have mastered        school system in the principal industrial countries, and to
                               official school curriculum content in math-    assess whether students can apply their knowledge and
                               ematics and science, which is common across   competencies in reading, mathematics, and science to real-
                               a range of countries.                         world situations.
                               To monitor changes in achievement levels      To monitor changes in achievement levels and equity of
                               over time.                                    learning outcomes over time.
                               To monitor students’ attitudes toward         To monitor student approaches to learning and attitudes to
                               mathematics and science.                      mathematics, science, and reading.
                               To examine the relationship between a range   To provide a database for policy development.
                               of instructional and school factors and
                               achievement.
                               (Reading is covered in separate PIRLS
                               assessment.)
Framework                      Developed by content experts from some        Developed by content experts from some participating
                               participating countries.                      countries.
Target population              Grades 4 and 8.                               15-year-olds.

Curriculum                     Designed to assess official curriculum         Designed to cover knowledge acquired both in school and
appropriateness                organized around recognized curriculum        out of school, defined in terms of overarching ideas and
                               areas common to participating countries.      competencies applied to personal, educational, occupa-
                                                                             tional, public, and scientific situations.
                                           TIMSS 2003                                                PISA 2003

Item content differences                   Grade 8, item distribution:                               Mathematics, overarch-     Item distribution:
(mathematics, grade 8)                     • Number, 30%                                             ing ideas:                 • Number, 31.8%
                                           • Algebra, 25%                                            • Quantity                 • Geometry, 21.2%
                                           • Data, 15%                                               • Space and shape          • Statistics, 21.2%
                                           • Geometry, 15%                                           • Change and relation-     • Functions, 10.6%




                                                                                                                                                                   INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT
                                           • Measurement, 15%                                          ships                    • Discrete math, 5.9%
                                                                                                     • Uncertainty              • Probability, 5.9%
                                                                                                                                • Algebra, 3.5%
Cognitive processes                        Grade 8:                                                  Item distribution:
                                           • Solving routine problems 40%                            • Connection, 47%
                                           • Using concepts 20%                                      • Reproduction, 31%
                                           • Knowing facts and procedures 15%                        • Reflection, 22%
                                           • Reasoning 25%
Item types (mathematics)                   About two-thirds being multiple-choice                    About one-third being multiple-choice items, with the
                                           items, with the remainder being constructed-              remainder generally being closed (one possible correct
                                           response or open-ended items.                             response) or open (more than one possible correct
                                                                                                     response) constructed-response items.

Frequency                                  Every four years: equal emphasis on math-                 Every three years: extensive coverage of one domain
                                           ematics and science in each cycle.                        (subject) every nine years (reading in 2000, mathematics in
                                                                                                     2003, and science in 2006), plus less extensive
                                                                                                     coverage of the other two every three years.
Geographical coverage                      48 countries: 20 high-income, 26 middle-                  30 OECD countries as well as 11 other countries.




                                                                                                                                                                   | 65
                                           income, and 2 low-income countries.
Analysis                                   Four benchmark levels and a mean score,         Seven mathematics proficiency levels and a mean score,
                                           which are based on all participating countries. which are based on OECD countries.
Source: TIMSS and PISA frameworks; U.S. National Center for Education Statistics n.d.; World Development Indicators database.
66   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




   Participation by nonindustrial countries in international studies has
generally been low. Nevertheless, in line with the general increase in
the number of countries that have taken part in international studies,
the number of nonindustrial countries has increased over the years.
TIMSS attracted the largest numbers in 2003 (seven from Africa) and
2007 (six from Africa). As was the case generally in international
studies, nonindustrial countries have shown a greater interest in taking
part in studies of mathematics and reading than in studies of other
curriculum areas.
   Recent growth in participation in international studies can be
attributed to globalization, to a movement in health and education to
benchmark services against those in other countries, and to interest in
global mandates. Some research evidence supports the view that
educational quality (in particular those aspects of it represented by
mathematics and science achievements) plays an important role in
economic growth, though it is not entirely consistent across countries
or over time (Coulombe, Tremblay, and Marchand 2004; Hanushek
and Kimko 2000; Hanushek and Wössmann 2007; Ramirez and
others 2006). Whatever the reason, education policy around the
world has increasingly focused on the need to monitor aggregate
student achievement in an international context.



ADVANTAGES OF INTERNATIONAL ASSESSMENTS


A variety of reasons have been proposed to encourage countries to
participate in an international assessment of student achievement.
Perhaps the most obvious is that international studies provide a
comparative framework in which to assess student achievement and
curricular provision in a country and to devise procedures to address
perceived deficiencies (Štraus 2005). By comparing results from differ-
ent countries, countries can use assessment results to help define what
is achievable, how achievement is distributed, and what relationships
exist between average achievement and its distribution. For example,
can high average achievement coexist with narrow disparities in per-
formance? Results from PISA suggest that it can.
                   INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT   | 67



   Data on achievement provide only limited information. It has
been argued that an advantage of international studies is that they
can capitalize on the variability that exists across education systems,
thereby broadening the range of conditions that can be studied
beyond those operating in any one country (Husén 1973). On this
basis, the analysis of data collected in these studies routinely consid-
ers associations between achievement and a wide range of contex-
tual variables. The range of variables considered includes curriculum
content, time spent on school work, teacher training, class size, and
organization of the education system. Clearly, the value of interna-
tional studies is enhanced to the extent that they provide researchers
and policy makers with information that suggests hypotheses about
the reasons students differ in their achievements from country to
country. The studies also provide a basis for the evaluation of policy
and practices.
   International assessments have the potential to bring to light the
concepts for understanding education that have been overlooked in a
country (for example, in defining literacy or in conceptualizing cur-
ricula in terms of intention, implementation, and achievement; see,
for example, Elley 2005). The assessments can also help identify and
lead to questioning of assumptions that may be taken for granted (for
example, the value of comprehensive compared to selective educa-
tion, smaller class sizes being associated with higher achievement, or
grade repetition benefiting students).
   International studies are likely to attract the attention of the media
and of a broad spectrum of stakeholders, such as politicians, policy
makers, academics, teachers, and the public. Differences between
countries in levels of achievement are obvious in the descriptive
statistics that are provided in reports of the studies. Indeed, those
differences are usually highlighted in “league tables” in which countries
are ranked in terms of their mean level of achievement. The com-
parative data provided in these studies have more “shock value” than
the results of a national assessment. Poor results can encourage debate,
which, in turn, may provide politicians and other policy makers with
a rationale for increased budgetary support for the education sector,
particularly if poor results are associated with a lower level of expen-
diture on education.
68   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




   An important feature of an international assessment is that it
provides data that individual countries can use to carry out within-
country analyses for what becomes, in effect, a national assessment
report. This practice is followed by countries that participate in PISA
(see B.3 in appendix B) and SACMEQ (see C.1 in appendix C). The
practice is enhanced if, in addition to the data collected for the inter-
national study, data that relate to issues of specific interest or concern
in individual countries are also collected.
   Participation in international assessments has a number of practical
advantages, particularly for countries that do not have the capacity in
their universities to develop the kinds of skills needed in national
assessments. First, a central agency may carry out national-level anal-
yses that can be used in individual country reports. Second, studies
may contribute to the development of local capacity in a variety of
technical activities: sampling, defining achievements, developing
tests, analyzing statistics, and writing reports. Third, staffing require-
ments and costs (for example, for instrument development, data
cleaning, and analysis) may be lower than in national assessments
because costs are shared with other countries.
   A study of the effect of TIMSS on the teaching and learning of
mathematics and science in participating countries provides evidence
of the variety of activities that an international study can spawn
(Robitaille, Beaton, and Plomp 2000):
• TIMSS results featured in parliamentary discussions about planned
  changes in education policy (Japan).
• The minister for education established a mathematics and science
  task force (New Zealand).
• The president directed that a “rescue package” be implemented to
  improve performance in science and mathematics (in which teacher
  training would receive particular attention) (the Philippines).
• National benchmarks were established in literacy and numeracy
  (Australia).
• Results contributed to the development of new educational stan-
  dards in mathematics and science (Russian Federation).
• Results helped change the nature of public discussions in the field
  of education from opinion-based discussions to fact-based discus-
  sions (Switzerland).
                  INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT   | 69



• Results led to the development of instructional materials that are
  based on analysis of the common misconceptions and errors of
  students in their response to TIMSS tasks (Canada).
• Results accelerated changes in revision of curricula (Czech Repub-
  lic; Singapore).
• TIMSS results were identified as one of a number of factors influ-
  encing policy changes in mathematics education (England).
• Committees were formed to revise mathematics and science
  curricula (Kuwait).
• New topics were added to the mathematics curriculum (Romania).
• New content was introduced to the mathematics and science
  curriculum relating to real-life situations (Spain).
• Results helped highlight the need to improve the balance between
  pure mathematics and mathematics in context (Sweden).
• TIMSS findings highlighted beliefs about gender differences and
  negative attitudes to science and mathematics and were used as a
  basis for curriculum reform and teachers’ professional develop-
  ment (Republic of Korea).
• Results influenced the outcome of discussions about improving the
  organization of, and emphasis in, teacher education (Iceland).
• TIMSS results led to taking steps to strengthen teacher profes-
  sional development in mathematics and science (Norway; the
  United States).
• A centralized examination system was established, partly in response
  to TIMSS results (Latvia).
• TIMSS findings influenced major changes in teaching, school and
  class organization, teacher education, and target-setting for schools
  (Scotland).
• TIMSS findings affected educational research, standards develop-
  ment, curriculum document development, teacher studies, math-
  ematics and science teaching methodologies, and textbook devel-
  opment (Slovak Republic).
  The results of analyses of PISA data have led to the following:
• Cast doubt on the value of extensive use of computers in the class-
  room to improve achievement.
• Highlighted the fact that level of national expenditure on education
  is not associated with achievement (among participating countries).
70   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




• Prompted general policy debate on education (Germany).
• Contributed to the development of the secondary-school science
  curriculum (Ireland).
• Emphasized the complexity of the relationship between socio-
  economic status and reading achievement across countries.
• Underscored the link between achievement and school types and
  curriculum tracking within schools.
• Supported the notion that public and private schools tend to have
  the same effects for the same kinds of pupils but that private govern-
  ment-dependent schools are relatively more effective for pupils
  from lower socioeconomic levels.
• Stressed the need for intensive language and reading programs for
  foreign-born students to help boost achievement (Switzerland).



PROBLEMS WITH INTERNATIONAL ASSESSMENTS

Despite obvious advantages, a number of problems associated with
international assessments merit consideration before countries decide
to participate in one (see Kellaghan 1996).
   First, an assessment procedure that will adequately measure the
outcomes of a variety of curricula is difficult to design. Although cur-
ricula across the world have common elements, particularly at the
primary-school level, considerable differences between countries also
exist in what is taught, when it is taught, and what standards of
achievement are expected.
   South Africa’s review of TIMSS items shows that only 18 percent
of the science items matched the national curriculum of grade 7,
while 50 percent matched the grade 8 curriculum (Howie and Hughes
2000). The greater the difference between the curricula and levels of
achievement of countries participating in an international assessment,
the more difficult it is to devise an assessment procedure that will suit
all countries, and the more doubtful is the validity of any inferences
that are made about comparative achievements.
   We would expect an achievement test that is based on the content
of a national curriculum to provide a more valid measure of curriculum
mastery than would one that was designed to serve as a common
                   INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT   | 71



denominator of the curricula offered in 30 to 40 countries. For example,
a national curriculum authority and the designers of an international
assessment might assign quite different weights of importance to a
skill such as drawing inferences from a text. A national assessment, as
opposed to an international assessment, can also test curricular aspects
that are unique to individual countries.
   Devising a common assessment instrument is more difficult for
some curriculum areas (for example, science and social studies) than
for others (for example, reading). In the case of science, for example,
achievement patterns have been found to be more heterogeneous
than in mathematics. Furthermore, a greater number of factors are
required to account for student performance differences in science
than in mathematics. Thus, a science test that would be appropriate
for a variety of education systems is difficult to envisage.
   A second problem with international studies is that—although early
studies had the ambitious aim of capitalizing on the variation that exists
in education systems to assess the relative importance of a variety of
school resources and instructional processes—this goal, in practice,
turned out to be very difficult to achieve. Because the relative effect of
variables depends on the context in which they are embedded, prac-
tices associated with high achievement in one country cannot be
assumed to show a similar relationship in another. In fact, the strength
of correlations between background factors and achievement has been
found to vary from country to country (see, for example, OECD and
UNESCO Institute for Statistics 2003; Wilkins, Zembylas, and Trav-
ers 2002). Particular difficulties exist when developing countries are
involved in a study designed for industrial countries because socio-
economic factors in such countries can differ very much from those that
prevail in industrial countries and can include poverty, nutritional and
health factors, and poor educational infrastructure and resourcing.
   Third, the populations and samples of students participating in
international assessments may not be strictly comparable. For example,
differences in performance might arise because countries differ in the
extent to which categories of students are removed from mainstream
classes and so may be excluded from an assessment (for example,
students in special programs or students in schools in which the
language of instruction differs from the language of the assessment).
72   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




The problem is most obvious where (a) age of enrolling in schools,
(b) retention, and (c) dropout rates differ from one country to
another and is particularly relevant in studies in which industrial and
developing countries participate. In some developing countries, large
proportions of students have dropped out well before the end of the
period of compulsory schooling. Whereas primary school net enroll-
ment ratios for Western Europe and North America are almost 100
percent, the ratios for countries in Sub-Saharan Africa are, on aver-
age, less than 60 percent (UNESCO 2002). Patterns of early dropout
can differ from country to country. In Latin American and Arab
countries, boys are more likely than girls not to complete grade 5; the
reverse is true in some African countries (for example, Guinea and
Mozambique). Sampling problems for TIMSS appeared in the
Republic of Yemen, where several schools did not have grade 4 classes
and where one school for nomadic children could not be located.
   Similar comparability problems can arise in a national assessment.
For example, the differential performance of students in states in
India has been attributed to differential survival rates (see A.1 in
appendix A).
   Fourth, because variation in test score performance is an important
factor if one is (a) to describe adequately the achievements of students
in the education system and (b) to determine correlates of achieve-
ment, carefully designed national tests must ensure a relatively wide
distribution of test scores. However, many items in international
assessments have been too difficult for students from less industrial
countries, resulting in restricted test score variance. This result is
reflected in the data presented in table 6.2, which are based on a
selection of countries that participated in TIMSS 2003.
   The data show the percentage of grade 8 students who reached
levels or benchmarks of performance when compared to all students
who took the test. Seven percent of all those who took the mathematics
test achieved the “advanced” international benchmark, 23 percent
the “high” benchmark, one-half the “intermediate” benchmark, and
roughly three-quarters the “low” benchmark. In sharp contrast, 2 per-
cent of Ghanaian students achieved the “intermediate” benchmark
and 9 percent achieved the “low” benchmark. Zero percent achieved
the “advanced” and “high” international benchmarks.
                         INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT                   | 73




TABLE 6.2
Percentage of Students Reaching TIMSS International Benchmarks in
Mathematics, Grade 8: High- and Low-Scoring Countries

Countries                          Advanceda            Higha       Intermediatea        Lowa
Singapore                               44                77               93             99
Chinese Taipei                          38                66               85             96
Korea, Rep. of                          35                70               90             98
International average                    7               23                49             74
Philippines                              0                3                14             39
Bahrain                                  0                2                17             51
South Africa                             0                2                 6             10
Tunisia                                  0                1                15             55
Morocco                                  0                1                10             42
Botswana                                 0                1                 7             32
Saudi Arabia                             0                0                 3             19
Ghana                                    0                0                 2              9
Source: Mullis and others 2004, 64.
a. Definitions used in TIMSS 2003: Advanced: Students can organize information, make generaliza-
tions, solve nonroutine problems, and draw and justify conclusions from data. High: Students can
apply their understanding and knowledge in a wide variety of relatively complex situations.
Intermediate: Students can apply basic mathematical knowledge in straightforward solutions.
Low: Students have some basic mathematical knowledge.


   Similarly, on PISA 2003, the limited use of the assessment for
internal policy making was underscored by the lack of test score vari-
ance in a number of participating countries; the majority of 15-year-
olds in Brazil, Indonesia, and Tunisia scored below Level 1. (Level 2
has been suggested as a minimum requirement for students entering
the world of work and further education.) Clearly, the information
that those studies provide for policy makers and decision makers on
the range of student achievements in these education systems is
limited. Furthermore, because of the limited variance in achieve-
ment, correlations between achievement and background or school
variables would throw little light on the factors that contribute to
achievement.
   Fifth, a problem arises when the primary focus in reporting the
results of an international assessment is on the ranking of countries in
terms of the average scores of their students, which are usually the
main interest of media. Rankings in themselves tell us nothing about
the many factors that may underlie differences between countries in
74   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




performance. Furthermore, rankings can be misleading when the
statistical significance of mean differences in achievement is ignored.
A country’s rank can vary depending on the countries that participate,
an important consideration when rankings over time are compared.
Thus, for example, if the number of traditionally high-achieving coun-
tries decreases and the number of traditionally low-achieving countries
increases, a country’s ranking may increase without necessarily imply-
ing an improvement in achievement.
   Sixth, poor performance in an international assessment (as well as in
a national assessment) can carry with it some political risks for key offi-
cials associated with the delivery of education, including ministers and
secretaries of education ministries. The risk is likely to be greater when
the international rank of a country is lower than that of a traditional
rival country. In some countries in which data were collected, officials
refused to allow the results to be included in between-country pub-
lished comparisons. (IEA no longer permits participating countries to
opt out of comparisons.) Obtaining comparative data for neighboring
countries or countries within a region would seem more appropriate
than obtaining data for countries across the world that differ greatly in
their level of socioeconomic development. An example of this approach
is found in Latin America and the Caribbean, where 13 countries
jointly carried out an assessment of basic competencies in language and
mathematics in 1997 (see C.3 in appendix C). The SACMEQ assess-
ments in southern and eastern Africa that were carried out under the
auspices of a network of ministries in the 1990s also allowed for inter-
national comparisons at a regional level (see C.1 in appendix C).
   Seventh, the demands of meeting deadlines may prove very
difficult in countries that lack administrative personnel and that have
to cope with a poor communications infrastructure (see box 6.1).
The time allowed for carrying out various tasks (for example, printing
or distributing booklets), which are associated with an international
assessment and which may be deemed reasonable in industrial
countries, may be insufficient given the range of basic problems—
including poor communication systems—that exist in many devel-
oping countries.
                      INTERNATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT        | 75




 BOX 6.1


  South Africa’s Experience with International Assessments

     South Africa’s experience with TIMSS underlines the problems facing
     implementers of international assessments. Deadlines imposed by
     organizers can be difficult, if not impossible, to meet in situations where
     mail service, telephone service, or funds for travel to schools are
     inadequate.
       Other problems include lack of accurate population data on schools;
     poor management skills; insufficient attention to detail, especially in
     editing, coding, and data capture; lack of funding to support project
     workers; and difficulty in securing quality printing on time. Instructions to
     test administrators (for example, to walk up and down the aisle) are
     obviously inappropriate when classrooms do not have an aisle.
     Source: Howie 2000.




   Finally, substantial costs are associated with participation in an inter-
national study. A country participating in TIMSS for grade 8 was
expected to pay US$40,000 in addition to all costs associated with
printing, distribution, test administration, data entry, and scoring. Na-
tional assessments, of course, also have considerable associated costs.
CHAPTER

        7
                        CONCLUSION


                         Readers who have persevered to this point should
     be familiar with the main features of national and international assess-
     ments, with how the assessments are similar and how they differ, with
     the reasons for engaging in an assessment, and with the problems to
     look out for in the process. Readers also should have a general under-
     standing of the main activities involved, including identification of key
     policy issues, construction of instruments, selection of schools and of
     students to represent the education system, analysis of data to describe
     student achievements and their correlates, and communication of
     findings to a range of audiences. Specialized knowledge and skills are
     required for all those tasks.
        If the reader is a senior policy maker or manager in a ministry of
     education, he or she is unlikely to possess any of the specialized knowl-
     edge or skills that are involved in the details of executing a national
     assessment. This lack does not mean that he or she does not have a
     crucial role to play in an assessment—from its initiation and general
     design, to facilitating its implementation, and to interpreting and
     applying its findings. In this chapter, we pay particular attention to the
     role of the policy maker or manager in the development and institu-
     tionalization of national assessment activity and in the optimal use of
     assessment findings.

                                                                           77
78   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




   Senior policy makers or managers who are in a position to make
decisions about whether to undertake a national assessment (or to
participate in an international assessment) should be convinced that
the information the assessment will provide will be useful in identify-
ing problems in the education system and in informing policy and
practice to address those problems. Their commitment is likely to be
enhanced if the assessment meets five conditions.
   First, the student achievements that are assessed are considered
important outcomes of schooling and adequately reflect the curriculum.
Second, the instrument used in the assessment has the potential to
provide diagnostic information about aspects of student achievement,
in particular, strengths and weaknesses in the profile of achievement.
Third, the method of sampling (if the assessment is sample based)
ensures that the data that are collected adequately represent the
achievements of the education system as a whole (or a clearly identi-
fied part of it). Fourth, appropriate analyses are used to identify and
describe the main features of the data, including relationships between
significant variables. Fifth, the technical aspects of the assessment
meet current professional standards in areas such as test development,
sampling, and statistical analysis.
   All those activities require considerable resources and political
support. For example, the policy maker or manager has a crucial role
in ensuring that the knowledge and skills that are required to design,
manage, and interpret a national assessment are available. In many
countries, they will not be available locally and will have to be devel-
oped specifically to carry out an assessment. This development will
require initial long- or short-term training programs. Following those
programs, provision should be made for increasing the technical skills
of those involved in the administration of a national assessment on a
regular basis through in-country training programs, attendance at
professional meetings, and more long-term graduate study.
   In some countries, national assessment activity seems to operate on
the fringes of the education system, divorced from the normal struc-
ture and processes of policy and decision making. In this situation, no
guarantee exists that the information obtained in an assessment will
be used to guide policy or that national assessments will be carried
out in the future to monitor how achievement might change over
                                                         CONCLUSION   | 79



time. To address those issues, national assessment activity should
become a normal part of the functioning of the education system.
This activity will require active involvement of some senior policy
makers in the overall design of the assessment and in either participa-
tion in, or representation on, the national steering committee. It will
also require an adequate budget and a decision about the location of
the activity, which will vary from country to country depending on
local circumstances.
    Long-term government commitment is very important in building a
strong institutional base for carrying out regular national assessments.
It can permit an agency to recruit and train individuals with key exper-
tise in areas such as test development, sampling, and statistical analysis.
Weak commitment can be reflected in a pattern of assigning national
assessment to different agencies, a strategy that does little or nothing to
build up much-needed technical expertise in the relevant disciplines.
In more than one country, multiple agencies have carried out separate
national assessments, using a range of approaches of limited value for
education policy making.
    In some instances, government commitment can be increased when
a unit within the ministry—supported by a line item in the education
budget—carries out the assessment. In Chile, for example, govern-
ment commitment and responsiveness to the results of the Sistema de
Medición de la Calidad de la Educación (SIMCE) increased when the
national assessment was transferred from a university to the ministry.
Annual assessment, timely reporting of results, and an appreciation of
the value of the results for policy making helped strengthen SIMCE’s
legitimacy, institutionalize its work, and ensure further long-term
government commitment and support. In a number of other Latin
American countries, assessment institutes, which are independent of
the ministry of education, have succeeded in developing a record of
competency and autonomy, thus conducting assessments with consid-
erable flexibility and consistency (Ferrer 2006).
    Institutionalization in itself is not enough, although it probably
would go some way toward ensuring that a situation does not arise in
which national assessment findings do not reach key government
personnel. A need also exists to invest effort in devising procedures to
communicate findings to stakeholders inside and outside the ministry.
80   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Apart from government officials, national assessment findings are
relevant to the work of curriculum developers, examination bodies,
teacher educators, and teachers in their everyday practice in schools.
Addressing the information needs of this variety of audiences requires
production of a number of reports and adoption of various dissemi-
nation strategies. Strategies should identify potential users (key
institutions and individuals) and their level of technical expertise. A
technical report is required (that provides sufficient information to
allow a replication of the study), but technical data also need to be
translated into forms that are accessible to nontechnical users, which
may be presented in a summary report (for example, for the public)
or in a more detailed report for policy makers, which can indicate,
for example, (a) if the system is underserving any particular group,
(b) if gaps warrant remedial action, and (c) if factors associated with
superior performance can be identified.
   In many countries, policy making tends to be influenced by political
priorities and the perceptions of ministers and senior officials. It is
frequently prompted by personal experiences and anecdotal informa-
tion, as well as by political pressure. Far too rarely is it informed by the
results of an analysis of valid and reliable data on the functioning of
the education system, such as can be provided by a well-designed and
implemented national assessment.
   Policy makers should provide leadership in ensuring that objective,
reliable evidence on the functioning of the education system provided
by the national assessment is used to help improve the overall quality
of policy making. They can do so by examining and reflecting on the
relevance of the national assessment results for policy making in areas
such as gender and regional equity, provision of educational materials
in schools, teacher qualifications, and provision of in-service courses
for teachers. They can reflect on whether changes introduced since
the previous national assessment appear to have affected student
achievement. They can encourage and support providers of teacher
education (preservice and in-service) courses to study the findings and
adjust current practices where evidence indicates the need for adjust-
ment. Policy makers can also advise a curriculum authority on changes
in curriculum content when evidence clearly indicates that students
find the material much too easy or, more likely, too difficult.
                                                       CONCLUSION   | 81



   Close involvement of policy makers at the outset in the overall
design of the assessment, and again when the assessment is complete
to discuss the relevance of results can help ensure that they come to
appreciate the value of a national assessment. Over time, it may be
hoped that the policy makers will come to regard a national assessment
as a key policy-making instrument.

   Brief descriptions of national assessment practices in nine countries
are presented in appendix A. The descriptions are not exhaustive,
and the cases are not presented as perfect models of good practice.
Several of them, in fact, are defective in a number of technical aspects.
They do, however, reveal similarities and differences in approach
that are of interest. Similarities are reflected in the fact that—in all
countries—assessments were carried out in language/literacy and mathe-
matics/numeracy at one or more primary-grade levels. In all countries,
assessments that were based on samples were carried out. In Chile
and Uruguay, assessments in which the population of schools partici-
pated were also carried out.
   Differences between countries are reflected in the frequency of
assessment, which varied from one to four years. The agency respon-
sible for implementation of the assessment also varied and included
the ministry of education, a government-supported research institute,
and a national examinations board. Considerable nonnational support
was available to the implementing agency in several countries. In at
least two countries (Chile and South Africa), the implementation
agency changed between assessments.
   The way in which student achievement was described varied from
citing the mean and distribution of the number of items to which stu-
dents responded correctly, to determining the percentage of students
whose performance reached “expected” standards or the percentage
scoring at varying levels of “proficiency.” Methods of analysis also
varied considerably, probably a reflection of the technical capacity of
national assessment teams. Sophisticated analytic approaches were used
in some countries (for example, the United States and Vietnam).
   The use of results from assessments seemed to vary a good deal,
although this conclusion is not certain because not a great deal of
information is available in most countries on the extent to which
82   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




results have been disseminated or have been effective in contributing
to policy formation. As well as describing gender differences, some
countries have used the results of a national assessment to support
the following actions:

• Provide policy recommendations for the education sector (Sri
  Lanka, Vietnam).
• Document regional disparities in achievement (Nepal, South Africa,
  Sri Lanka).
• Design a major in-service program for teachers (Uruguay).
• Provide financial and other forms of support to low-scoring
  schools (Chile).
• Bring strengths and weaknesses in student achievements to the
  notice of teachers (Uganda).
• Describe changes in the achievements of minority-group students
  over time (United States).
• Suggest a reduction in the emphasis on algebra and geometry in
  the curriculum (Bhutan).

  Those involved in the design of a national assessment might like to
consider a number of somewhat unusual practices that are features of
the assessments described in appendix A:

• Launching a public-awareness campaign prior to the assessment
  (Chile).
• Collecting data in conjunction with data on student achievement
  to monitor the extent to which school facilities improve over time
  (Vietnam).
• Administering the achievement test to teachers as well as to stu-
  dents (India, Vietnam).
• Working closely with teacher unions to carry out the assessment
  (Uruguay).

   Appendix B provides descriptions of the main features of three cur-
rent, large-scale, international studies that span the globe. Those studies
focus on reading/literacy, mathematics/numeracy, and science—three
areas of knowledge and skill that would probably be regarded as
“core” in students’ education in all countries. All three studies are also
concerned with monitoring student achievement over time.
                                                        CONCLUSION   | 83



   The level of technical competence in international studies is very
high, and countries can improve their knowledge and skill by partici-
pating. Many countries, as we have seen, also use the data collected in
an international assessment to carry out national-level analyses, in
effect using the international assessment as a national assessment. This
procedure can be enriched if national-level background information is
collected in addition to that required in the international study.
   The design of international studies is very similar to the design of
a national assessment, except that cognizance has to be taken of the
fact that the assessment will be carried out in a number of countries.
Thus, assessment instruments may not be equally appropriate in all
countries, either because they do not adequately represent school
curricula, which vary from country to country, or because they do
not adequately reflect the range of student achievements, which can
vary enormously from country to country. Two approaches have
been adopted to address variation in school curricula. In the Trends
in International Mathematics and Science Study (TIMSS) (B.1 in
appendix B), as in earlier studies carried out under the auspices of
the International Association for the Evaluation of Educational
Achievement, tests are developed in a consensus-building exercise
among participating countries in which common elements of their
curricula are included in tests. The approach of the Programme for
International Student Assessment (PISA) (B.3 in appendix B) has
been not to base assessment instruments on an analysis of curricula,
but to use “expert” opinion to determine the knowledge and skills
that 15-year-olds should have acquired near the end of compulsory
education if they are to participate fully in society.
   The fact that student achievement is related to countries’ economic
development means that assessments designed for industrial countries
(such as TIMSS and PISA) are unlikely to provide a satisfactory
description of achievement in a developing country. Regional studies
for less industrial countries have been created to address this issue, and
three such studies—two in Africa and one in Latin America—are
described in appendix C. Those studies act as both national and
international assessments.
                   APPENDIX A

                   COUNTRY CASE STUDIES



A.1. India

Purpose. An assessment was developed to help the government of
India provide baseline data on the quality of education for each of
its states. The assessment was part of the government’s Sarva Shiksha
Abhiyan (SSA) program, which aimed to achieve universal enroll-
ment up to the completion of elementary education by 2010. Earlier
large-scale achievement assessments had been carried out in desig-
nated school districts as part of the government’s District Primary
Education Project (Prakash, Gautam, and Bansal 2000). Mean
scores for mathematics and language were compared by district,
subject area, and grade level. The assessment concluded that students
were better in language and that the average achievement in the
sample of older students was not as impressive as that of students in
lower grades. The majority of differences within districts between
boys and girls in mathematics and in language were not statistically
significant. In addition to this district-level assessment, a large-scale
assessment was carried out in 22 states in the early 1990s (Shukla
and others 1994).

Frequency. Every three years.


                                                                     85
86   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Grades. The grade 5 assessment was administered in 2001–02. Grade
3 and the terminal grade for elementary education (which varies from
state to state) were also assessed.

Achievements assessed. Language and mathematics.

Who did it? National Council of Research and Training, Delhi, with
the support of the District Institutes of Education, which supervised
the data collection.

Sample or population. Sample.

Analysis. Reported grade 5 scores for each state in terms of the per-
centage of items answered correctly.

Use of results. Grade 5 results showed small gender and rural-urban
gaps in achievement levels. The data will be used to monitor changes
in levels of educational achievement and to identify educational and
noneducational factors that may help account for differences in student
achievement.

Interesting points. An earlier large-scale 22-state assessment adminis-
tered the same test to teachers and students. In one state with very
low mean student scores, only 1 of 70 teachers who took the test
answered all 40 arithmetic items correctly. Among the teachers,
10 percent answered fewer than half the items correctly (Shukla and
others 1994).
   The national assessment will be used to help monitor the effect of
the SSA initiative. Unlike most other national assessments, scores
are reported in terms of overall percentage of items answered cor-
rectly. States with particularly poor achievement levels are expected
to receive special attention. Some states with strong education tradi-
tions in terms of school participation rates (for example, Kerala and
Himachal Pradash) recorded relatively low mean scores on the grade
5 assessment, while some of the states with relatively low school
participation rates (for example, Bihar, Orissa, and West Bengal)
scored higher. This outcome, which was also reported in the earlier
22-state assessment, is explained by the fact that in the latter states,
the samples of students taking the tests tended to be “survivors” in
the education system; many of the less advantaged students in terms
                                             COUNTRY CASE STUDIES   | 87



of home background and ability levels would have dropped out of
school by grade 5.

Source: India, National Council of Educational Research and Training,
Department of Educational Measurement and Evaluation 2003.


A.2. Vietnam

Purpose. To measure the quality of education with a particular focus
on student achievement at the primary level.

Frequency. Previous small-scale assessments had been carried out
between 1998 and 2000 at grades 3 and 5, but they were inappro-
priate for providing benchmark information for monitoring trends
over time.

Grade. 5.

Achievements assessed. Vietnamese reading and mathematics in 2001.

Instruments. Achievement tests; pupil, teacher, and school question-
naires.

Who did it? Ministry of Education and Training supported by other
national agencies and an international team supported by the World
Bank and the Department for International Development of the
United Kingdom.

Sample or population. Sample was designed to be representative of
the national population and populations in each of 61 provinces.

Analysis. Analyses included cross-tabulations of achievement data and
school data by region, correlates of achievement, factor analysis, item
response modeling of test item data, and hierarchical linear modeling
for identification of factors associated with achievement.

Use of results. Government officials made 40 policy recommendations
that were based on the overall results.

Interesting points. Tests included items from the 1991 International
Association for the Evaluation of Educational Achievement Reading
Literacy Study (Elley 1992, 1994) that were used to compare results
88   |   ASSESSING NATIONAL ACHIEVEMENT LEVEL IN EDUCATION




with other countries. The same tests were administered to teachers
and students; 12 percent of students scored higher than 30 percent
of teachers. Fewer than 3 percent of schools had obligatory school
resources (for example, library, piped water). More than 80 percent
of pupils were in classrooms that had minimal resources (writing
board, chalk, and so on) while 10 percent were being taught by teach-
ers who had not completed secondary school.
   Six levels of proficiency were established according to students’
performance on the reading test:

• Level 1. Matches text at word or sentence level aided by pictures.
  Restricted to a limited range of vocabulary linked to pictures.
• Level 2. Locates text expressed in short repetitive sentences and can
  deal with text unaided by pictures. Text is limited to short sentences
  and phrases with repetitive patterns.
• Level 3. Reads and understands longer passages. Can search backward
  or forward through text for information. Understands paraphrasing.
  Expanding vocabulary enables understanding of sentences with some
  complex structure.
• Level 4. Links information from different parts of the text. Selects
  and connects text to derive and to infer different possible meanings.
• Level 5. Links inferences and identifies an author’s intention from
  information stated in different ways, in different text types, and in
  documents where the information is not explicit.
• Level 6. Combines text with outside knowledge to infer various
  meanings, including hidden meanings. Identifies an author’s pur-
  poses, attitudes, values, beliefs, motives, unstated assumptions, and
  arguments.

   There was considerable variation in the level of student performance
on both the reading and mathematics tests. For example, far fewer
students attained the two highest levels of reading in Ha Giang and
Tien than in Da Nang (table A.2.1). The relationship between teacher
characteristics and students’ scores was examined after taking home
background into account (table A.2.2).

Source: World Bank 2004.
                                                                      COUNTRY CASE STUDIES     | 89




TABLE A.2.1
Percentages and Standard Errors of Pupils at Different Skill Levels in Reading
                Unit
 Province       indicator         Level 1      Level 2    Level 3      Level 4   Level 5   Level 6
 Ha Giang       Percentage          7.5        22.1         27.4        18.7     18.5         5.7
                SE                  1.66         3.23        3.06        2.97      3.07       2.09
 Tien           Percentage          2.8        13.4         28.8        20.2     22.4        12.5
 Giang          SE                  0.7          2.0         2.49        1.8       2.46       2.78
 Da Nang        Percentage          0.8          5.7        15.4        21.3     32.9        24.1
                SE                  0.34         0.88        1.79        1.89      1.98       3.23
 Vietnam        Percentage          4.6        14.4         23.1        20.2     24.5        13.1
                SE                  0.17         0.28        0.34        0.27      0.39       0.41
 Source: World Bank 2004, vol. 2, table 2.3.
 Note: SE = standard error.




TABLE A.2.2
Relationship between Selected Teacher Variables and Mathematics
Achievement
                                                                        Partial correlation, after
                                                                          taking pupil’s home
 Teacher variable                         Simple correlation            background into account
 Sex of teachera                                  0.17                             0.14
 Academic education                               0.08                             0.04
 Subject knowledge of
 mathematics                                      0.29                             0.25
 Classified as “excellent
 teacher”                                         0.18                             0.13
 Classroom resources                              0.24                             0.15
 Number of hours
 preparing and marking                            0.00                             0.01
 Frequency of meeting
 with parents                                     0.05                             0.04
 Number of inspection
 visits                                           0.13                             0.11
 Source: World Bank 2004, vol. 2, table 4.38.
 Note: Correlations greater than 0.02 are statistically significant.
 a. Pupils taught by female teachers scored higher.
90   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




A.3. Uruguay

Purpose. The national assessment aimed to identify (a) the extent to
which primary school graduates had developed a “fundamental under-
standing” of language and mathematics, and (b) the sociocultural factors
that may have a bearing on student achievement. The assessment em-
phasized professional development, which included diagnosing learning
problems, giving teachers information about student performance, and
helping them improve teaching and evaluation. The assessment also
aimed to use the data from the tests and questionnaires to improve
school conditions.

Frequency and grade. Grade 6 (every three years) in 1996, 1999,
2002, and 2005. In addition, grades 1, 2, and 3 were assessed for
teacher development purposes in 2001. Grade 9 was tested in 1999
and grade 12 in 2003. Since 2003, 15-year-olds are being assessed as
part of the Programme for International Student Assessment (PISA).

Achievements assessed. Mathematics (problem solving) and reading
comprehension in grade 6; mathematics, language, and natural and
social sciences in grades 9 and 12.

Instruments. Achievement tests; parent, teacher, and principal ques-
tionnaires.

Who did it? Early on, Unidad de Medición de Resultados Educativos
(UMRE), a unit created as part of a World Bank–financed project,
was responsible for the national assessment at grade 6 while Programa
de Modernización de la Educación Secundaria y Formación Docente
(MESyFOD), an Inter-American Bank–funded project, was responsi-
ble for the national assessment at the secondary level. Since 2001, the
assessment activities have been unified and institutionalized under
the Gerencia de Investigación y Evaluación (Research and Assess-
ment Division), part of the National Administration for Public
Education. Finance is provided by international donor agencies.

Sample or population. Population and sample of grade 6 students,
excluding very small rural schools; population of grade 9 students; sam-
ple of grades 1, 2, 3, and 12; sample for PISA assessments.
                                                COUNTRY CASE STUDIES   | 91



Analysis. UMRE used 60 percent correct as an index of adequacy of
pupil performance. Individual school scores were compared to the
national average, to the departmental or regional average, and to schools
serving students from similar socioeconomic backgrounds. Achieve-
ment test data were related to background factors.

Use of results. Results were used mainly by teachers, principals, and
the school inspectorate. The government used the results to identify
schools for special support and for large-scale, in-service, teacher-train-
ing programs. National-level results were widely publicized. Forty days
after testing and before the end of the school year, participating schools
received a confidential report with aggregate school results presented
item by item. The reports did not include individual student results or
results disaggregated by classroom. UMRE (a) produced teaching
guides to help address perceived weaknesses in language and mathe-
matics and organized in-service, teacher-training programs for schools
in disadvantaged areas, (b) prepared reports for supervisory personnel,
and (c) held workshops for inspectors that drew on the test results.
Tests were made available to schools other than the sampled ones. Every
school received a report of national averages for each competency
tested. Nontested schools were sent norms for comparative purposes.
Close to 80 percent of those schools administered the tests and
compared their results to supplied national norms. Inspectors held
their own workshops to develop an understanding of the results, to
appreciate the effect of social deprivation on student learning out-
comes, and to suggest courses of action to enhance educational quality.

Interesting points. Initially the teachers’ union at the primary level was
strongly opposed to the national assessment. In particular, it opposed
the publication of individual school results. Eventually, the union was
won over by the government’s agreement not to publish results for
individual schools or teachers, but to allow the results to be used for
diagnostic purposes. Only aggregate data were to be published. In
addition, the government invited teachers to participate (a) in the
groups that planned the assessment and (b) in other advisory groups.
Teachers were also heavily involved in test development. To date, little
opposition has arisen to formal assessment of this type at the primary
level. There has been a general acceptance that teachers or schools will
92   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




not be penalized for poor test results. The secondary teachers’ union
has not been very supportive of the assessment and has adopted a
wait-and-see attitude. The acceptance by teachers of the UMRE initia-
tive and of the results is attributable to confidentiality of test results,
prompt reporting, contextualization of test scores by sociocultural
background, and acknowledgment that student outcomes depend on a
combination of factors (including household, school, community, and
teacher variables).
   Although governments in some countries are seeking ways to hold
schools and teachers accountable for student outcomes, Uruguay
takes a different approach. The state takes responsibility for promoting
an enabling environment to help achieve equity within the educa-
tion system.

Sources: Benveniste 2000; Ravela 2005.


A.4. South Africa

Purpose. South Africa has conducted a series of national assessments at
grades 3, 6, and 9. It also participated in three international studies (a)
to provide baseline data against which future progress could be
monitored and (b) to allow South Africa to compare curricula and
achievement in mathematics and science with those in industrial
countries. Each of the international studies could be considered a na-
tional assessment of educational achievement. Participation in interna-
tional assessment provided an opportunity for capacity development.
   South Africa was the only African participant in the Trends in
International Mathematics and Science Study (TIMSS) in 1995, and it
participated with Morocco and Tunisia in TIMSS in 1999, and with
those countries and Botswana, Ghana, and Egypt in TIMSS in 2003.
South Africa also participated in the Southern and Eastern Africa
Consortium for Monitoring Educational Quality grade 6 assessment
that was carried out in 2000 and in the grade 4 Monitoring Learning
Achievement assessment, which commenced in 1992.

Frequency. TIMSS 1995, 1999, and 2003.

Grade. 8.
                                               COUNTRY CASE STUDIES   | 93



Instruments. Achievement tests; student, teacher, and principal ques-
tionnaires.

Achievements assessed. Mathematics and science.

Who did it? Human Sciences Research Council in 1995 and 1999, and
University of Pretoria in 2003.

Sample or population. Sample. One intact grade 8 class was sampled
in each selected school.

Analysis. The study compared student performance in mathematics
and science with that of other countries in terms of average perfor-
mance and performance at the 5th, 25th, 50th, 75th, and 95th percen-
tiles. It also compared South Africa with other participating countries
in terms of students’ backgrounds and attitudes, curricula, teacher
characteristics, classroom characteristics, and school contexts for learn-
ing and instruction. It included a comparison of mean performance
scores over time.

Use of results. TIMSS results have been used in parliamentary debates.

Interesting points. South Africa has 11 official languages. Some words
had to be translated into South African English, and some contexts
had to be modified. A considerable amount of time was devoted to
solving logistical problems that are attributable to the inadequacies of
services, such as mail and telephone, which are taken for granted else-
where. The national research team found deadlines imposed by
TIMSS difficult to honor. The initial effort at sampling unearthed
about 4,000 schools that were not in the national database. Transfer
of assessment-related skills between the teams that carried out the
three TIMSS assessments has been limited. Only one of the staff
members from the first TIMSS assessment team participated in TIMSS
2003. Most students took the test written in a language other than
their home language.
   The second TIMSS study was used for a detailed, in-country study
(Howie 2002). Findings included the following:

• Official class-size statistics were different (much larger) from those
  found in the nationally representative sample of participating schools,
  which suggests inaccurate reporting of school enrollment data.
94   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




• Some students were afraid that their performance on the tests
  would count toward their official school results. Some were afraid
  to ask for help. Many struggled with open-ended questions. Late
  arrival, absenteeism, and cheating during test administration caused
  additional problems.
• Many students had problems completing tests and questionnaires
  because of language difficulties. Many teachers lacked the language
  fluency to communicate effectively with pupils.
• Teachers spent a lot of time teaching material that should have been
  covered in earlier grades.
• Close to one-quarter of the teachers of grade 8 students were not qual-
  ified to teach mathematics and had no postsecondary qualification.
• Pupils whose home language was either English or Afrikaans scored
  significantly higher than pupils who spoke another African language
  at home.
• Less than 0.5 percent of students achieved the highest level of mathe-
  matics performance, compared to 10 percent of the international
  sample. The mean score (381) for the highest scoring of the nine
  provinces (Western Cape) was significantly lower than the interna-
  tional TIMSS mean score (487).
• Neither school nor class size was a significant predictor of mathe-
  matics achievement.

   National assessments at grades 3, 6, and 9 requested by the Depart-
ment of Education were carried out to get baseline data for future
assessments and to suggest policy initiatives. Each of those assessments
used questionnaire data, as well as achievement test data, to provide a
basis for evaluating long-term efforts to improve access, quality, effi-
ciency, and equity. Provincial comparisons produced evidence of
strong regional differences in achievement. Overall performance levels
were considered low. For example, mean percentage-correct scores as
low as 38 percent were recorded for language, 27 percent for mathe-
matics, and 41 percent for natural sciences in the grade 6 assessment.
Separate grade 6 reports were prepared for each province as well as for
one national report.

Sources: Howie 2000, 2002; Kanjee 2006; Reddy 2005, 2006.
                                                      COUNTRY CASE STUDIES     | 95



A.5. Sri Lanka

Purpose. To assess the achievements of pupils who had completed
grade 4 in 2003.
Frequency. Previous assessments had been carried out at grades 3
(1996) and 5 (1994, 1999). Further assessments have been carried out
at grade 4 (2007) and grades 8 and 10 (2005).
Grade. 4
Achievements assessed. First language (Sinhala or Tamil), mathematics,
and English.
Instruments. Achievement tests; questionnaires administered to school
principals, sectional heads, class teachers, and parents (see table A.5.1).
Who did it? National Education Research and Evaluation Centre,
located in the Faculty of Education, University of Colombo.
Sample or population. Sample designed to be representative of the
national population of grade 4 students and of grade 4 populations in
each of the nine provinces.
Analysis. Comparisons of achievement scores by school type, loca-
tion, gender, and level of teacher training. Provinces and districts were
rank-ordered in each subject area. Path analysis was used to analyze
relationships between school, home background, and student factors,
on the one hand, and student achievement, on the other hand.
Use of results. Results were used for analysis of the education sector
to help develop a new strategy for government and donor support for
education and are currently being used to establish benchmarks
against which student achievement levels in each of the provinces are
being monitored.
Interesting points. The Sri Lankan national assessment team selected a
score of 80 percent as the cutoff point for determining “mastery.”1 The
percentages of students who were considered to have “mastered” each

1
 This determination was apparently based on a cutoff point used by the United Nations
Educational, Scientific, and Cultural Organization in earlier Monitoring Learning
Achievement studies (UNESCO 1990).
96   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




TABLE A5.1
Background Data and Source in Sri Lankan National Assessment

Type of                                                                   Number of
information          Questionnaire           Sections                     questions

School               Principal               •   General background
background                                   •   Teacher profile
                                             •   School facilities
                                             •   Financial status
                                             •   Opinions                     37

                     Section head            • General background
                                             • School facilities
                                             • Teaching-learning-
                                               assessment procedures
                                             • Opinions                       13

                     Class teacher           • General background
                                             • Academic and profes-
                                               sional information
                                             • Classroom details
                                             • Opinions                       41

Home                 Parents                 •   General background
                                             •   Home facilities
                                             •   Socioeconomic status
                                             •   Learning support
                                             •   Opinions                     51

                     Students                •   General background
                                             •   Preschool education
                                             •   Post-school activities
                                             •   Opinions                     26

Source: Perera and others 2004, table 3.7.


of the three subject areas tested were reported. The results suggest that
the expected standard was set at an unrealistically high level. While on
the basis of mean scores, the report of the assessment concluded that
overall performance in the first language “seems to be of a satisfactory
standard” (Perera and others 2004, 47), when performance is assessed
on the basis of mastery level, a different picture emerges. Fewer than
40 percent of students achieved mastery in the local language and in
mathematics, and fewer than 10 percent did so in English. Results
showed wide disparities in achievement among provinces and districts
                                                     COUNTRY CASE STUDIES    | 97



TABLE A.5.2
Percentage of Students Achieving Mastery in the First Language, by
Province
                                                  Percentage         Target
Group                Rank          Province    achieving mastery   percentage
Above 50%              1       Western              53.5              80.0
26–50%                 2       Southern             42.6              80.0
                       3       North Western        42.2              80.0
                       4       Sabaragamuwa         40.2              80.0
                       5       North Central        35.6              80.0
                       6       Uva                  33.9              80.0
                       7       Central              33.8              80.0
1–25%                  8       Eastern              23.7              80.0
                       9       Northern             22.7              80.0
Source: Perera and others 2004, table 4.14.


(table A.5.2). Subgroups with low achievement levels were identified.
Separate reports were published for each of the country’s nine prov-
inces.

Source: Perera and others 2004.



A.6. Nepal

Purpose. The 2001 national assessment was carried out to determine
the extent to which student achievements had changed over a four-
year period during a time of major policy changes.

Frequency. Baseline data were obtained on grade 3 students in 1997.
(Grade 5 was assessed in 1999.)

Grade. 3.

Achievements assessed. Mathematics, Nepali, and social studies.

Instruments. Achievement tests of mathematics, Nepali, and social
studies that were administered to all sampled students. Questionnaires
were administered to headmasters and teachers of the three targeted
subject areas in each sampled school. Twenty-five percent of students
and their parents were interviewed.
98   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Who did it? Educational and Developmental Service Centre.

Sample or population. A sample of 171 schools.

Analysis. Test scores above 75 percent correct merited a “satisfactory”
performance rating. Other analyses included reliability studies of each
test and comparisons of mean scores for 1997 and 2001. Analysis of
variance was used to compare mean score performances of students
across regions, and multiple regression analysis was used to identify
factors related to student achievement.

Use of results. Results were used to monitor changes in achievement
from 1997 to 2001 and, in particular, to evaluate the effect of policy
changes that included budgetary increase, new curricula, new text-
books and teaching materials, and new teacher centers and teacher
training centers. Highest-performing regions were identified. In 2001,
the difference between boys’ and girls’ mean scores was significant only
in the case of mathematics; boys recorded the higher mean scores.
Overall mean social studies scores were significantly higher in 2001
than in 1997.

Interesting points. The data helped identify curriculum areas where
students appear to have had some difficulty. In mathematics, students
generally were able to describe words in numbers and numbers in words,
measure time and weight, add numbers in words up to four digits, and
add decimal numbers. They tended to be unable to do word problems
involving any of the four basic operations (addition, subtraction, multi-
plication, division). In Nepali, the average student tended to be able to
read a simple story and to use some vocabulary but not read and cor-
rectly answer questions based on passages or questions that described a
pictorial story.
    Results of the assessment showed that many of the reforms appeared
to have had little effect. More than 60 percent of teachers indicated that
their classes were never supervised. They tended to receive relatively
little in-service support. About one-third were untrained. Classroom
instruction was deemed ineffective.
    The report concluded that although many reforms clearly had taken
place, it was probably too early to expect improvements in student
achievement. The national assessment report also highlighted the
                                                COUNTRY CASE STUDIES   | 99



relatively poor quality of home support for education. More than
one-quarter of mothers were classified as illiterate, while fewer than
7 percent had completed education up to grade 5.

Source: Khaniya and Williams 2004.



A.7. Chile

Purpose. Chile’s Sistema de Medición de la Calidad de la Educación
(SIMCE) was originally designed to help guide parents in school selec-
tion. It now seeks (a) to provide feedback on the extent to which
students are achieving the learning targets considered minimal by the
Ministry of Education; (b) to provide feedback to parents, teachers, and
authorities at municipal, regional, and central levels; and (c) to provide
data for policy makers to guide allocation of resources in textbook and
curriculum development and in in-service teacher education, especially
in the neediest areas. It aims to improve the education system by install-
ing procedures that stress evaluation, information, and incentives. It also
serves to underline the Ministry of Education’s commitment to improve
both quality and equity within the education system.
   Chile also runs a separate but related assessment system as a basis for
rewarding excellence under the SNED (National System of Teacher
Performance Assessment in Publicly Supported Schools) by providing
incentives to teachers and schools to raise student achievement levels.

Frequency. Annual.

Grades. 4 and 8.

Achievements assessed. Spanish (reading and writing), mathematics,
natural and social sciences.

Instruments. Pupils who complete achievement, self-concept, and
perception tests. Questionnaires that were completed by principals,
teachers, and parents (one year only).

Who did it? First administered in 1978 by an external agency, the Pon-
tificia Universidad Católica de Chile, the SIMCE assessment is now
administered by the Ministry of Education.
100   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Sample or population. All (practically all) students in the relevant
grades are assessed in Spanish and mathematics. Natural science, his-
tory, and geography tests are administered to 10 percent of students.
Very small schools in inaccessible locations are excluded.

Analysis. Schools receive a ranking in comparison with other schools
in the same socioeconomic category, as well as a national ranking.
SIMCE identifies 900 schools that score in the lowest 10 percent in the
mathematics and language tests within their provincial regions for
which special resources are provided (P-900 program).

Use of results. SIMCE results are used extensively in policy discussions.
SIMCE reports classroom results containing the average percentage of
correct answers for each objective assessed, as well as the average num-
ber of correct answers over the entire test. At the beginning of the school
year, SIMCE reports results nationally and also by school, location, and
region. SIMCE manuals explain the results and how teachers and schools
might use them to enhance student achievement. P-900 program
schools receive support in the form of improved infrastructure; text-
books and classroom libraries; teaching material; and in-service, school-
based workshops. Schools are removed from the P-900 program when
their SIMCE scores exceed the 10 percent cutoff limit.
   The SNED program uses SIMCE scores along with four other mea-
sures of school quality. Teachers in the best-performing schools within
a region receive a cash award roughly equivalent to a monthly salary. In
an effort to ensure equity, the ministry selects schools catering to simi-
lar socioeconomic groups that are classified in terms of urban or rural
location and elementary or secondary school level. Although a range of
factors is taken into account in calculating the index, school achieve-
ment accounts for almost two-thirds of the index score (table A.7.1).
The weighting system is regularly modified to reflect policy priorities.

   Interesting points. SIMCE uses an intensive public-relations cam-
paign that includes brochures for parents and schools, posters for
schools, videos for workshops, television programs, and press releases.
Reports are distributed to principals, municipal leaders, school supervi-
sors, and ministry officials. Parents also receive an individualized report
for their school. Newspapers publish school-by-school results. Because
                                                    COUNTRY CASE STUDIES   |   101




TABLE A.7.1
Index for Merit Awards for Schools in Chile, 1998–99
 Factor                                                          Percentage
 Effectiveness (SIMCE scores in math and science)                   37
 Value added (average SIMCE gain in score)                          28
 Initiative                                                          6
 Improvement in work conditions                                      2
 Equality of opportunity                                            22
 Parent-teacher cooperation                                          5
 Source: Delannoy 2000, table 1.5.




municipalities receive funding from the central government on a per
student basis, they have a vested interest in the outcome; good SIMCE
results tend to attract more students and hence more revenue.
   Schools that have a large number of absentees on the date of testing
do not receive results. Some schools overestimated the extent of stu-
dent poverty to help increase their chances of qualifying for aid under
the P-900 program. Teachers tend to be more concerned with their
school’s rank relative to similar schools than with the opportunity to
use the results to promote in-school dialogue to help diagnose areas
where students appear to have learning difficulties. Some teachers have
been critical of the overly technical nature of the school reports. SIMCE
devotes relatively little attention to data obtained in student, parent,
and teacher questionnaires. Attitudes to learning and student values
proved technically difficult to measure. The SNED program assumes
that financial incentives will inspire teachers to make greater efforts to
enhance student learning.

Sources: Arregui and McLauchlan 2005; Benveniste 2000; Himmel
1996, 1997; McMeekin 2000; Olivares 1996; Wolff 1998.



A.8. United States

Purpose. The National Assessment of Educational Progress (NAEP),
which commenced in 1969, measures students’ educational achieve-
ments and monitors changes in achievement at specified ages and
102   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




grades. NAEP, often termed “The Nation’s Report Card,” also exam-
ines achievements of subpopulations defined by demographic charac-
teristics and by specific background experiences. The sample in most
states in NAEP is sufficiently large to allow inferences to be made about
achievement in individual states.

Frequency. Assessments are carried out at least once every second year in
mathematics and reading and less frequently in other curriculum areas.

Grades. 4, 8, and 12. Separate state-level assessments using NAEP
tests are limited to grades 4 and 8.

Achievements assessed. Mathematics, reading, science, writing, the
arts, civics, economics, geography, and U.S. history. New subject areas
to be assessed: foreign language and world history.

Instruments. Achievement tests in reading, mathematics, science,
writing, U.S. history, civics, economics, geography, and the arts. A
student questionnaire (voluntary) at the end of the test booklet
collects information on students’ demographic characteristics, class-
room experiences, and educational support. A teacher questionnaire
focuses on teacher background, training, and instructional practices.
A school questionnaire seeks information on school policies and char-
acteristics. Background data on students with disabilities or English-
language learners are provided by the teacher.

Who did it? A National Assessment Governing Board, appointed by
the Secretary of Education, has overall responsibility for NAEP. The
board consists of governors, state legislators, local and state school
officials, educators, business representatives, and members of the gen-
eral public. Various agencies have been contracted to carry out aspects
of NAEP. Over the 2003–06 period, separate agencies have had
responsibility for each of the following activities: item development,
analysis, sampling and data collection, distribution and scoring, and
Web site maintenance.

Sample or population. Samples of grade 4 and 8 students at the state
level (public schools only) and grade 12 students at the national level.
The sample size for each NAEP test is about 2,500 students in each
state. A separate, long-term-trend study reports national-level results
                                             COUNTRY CASE STUDIES   |   103




in mathematics and reading for age samples 9, 13, and 17 drawn from
both public and private schools.

Analysis. Each student takes only a portion of the overall number of
test items in a given content area. Data allow for group comparisons
(for example, male and female students in an individual state). Item
response modeling is used to estimate the measurement characteris-
tics of each assessment question and to create a single scale to repre-
sent performance. Sampling weights are applied to reflect population
characteristics. Scales are constructed that permit comparisons of
assessments conducted in different years for common populations on
related assessments. Quality-control measures are applied at each
analytical stage. Percentages of students falling into each of three
proficiency levels—“basic” (partial mastery of prerequisite knowl-
edge), “proficient” (competent command of subject matter), and “ad-
vanced” (superior level performance)—are reported.

Use of results. Results are widely publicized. Political spokespersons
and others have used NAEP results to highlight both positive and neg-
ative messages about the quality of the U.S. school system.

Interesting points. NAEP monitors trends in subgroup performance.
Particular attention is given to the rates of progress of minority sub-
groups, notably increases in reading scores since 1971. Overall, reading
and mathematics scores increased for fourth grade students, and the
racial achievement gap narrowed. Generally, flat growth rates in read-
ing achievement were recorded during a period when the number of
Hispanic students (who traditionally have had difficulty mastering
reading in English) doubled. The changing nature of the student popu-
lation makes it difficult to establish whether efforts to improve peda-
gogy and curriculum are having an effect.

Sources: Johnson 1992; U.S. National Center for Education Statistics
2005, 2006.


A.9. Uganda

Purpose. The National Assessment of Progress in Education (NAPE),
which was conducted in July 2005 in the second school term, was one
104   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




in a series of national assessments in Uganda. The specific objectives of
the assessment were the following:

• Determine the level of pupils’ achievement in English literacy and
  numeracy.
• Examine relationships between achievement and pupils’ gender and
  age, school location (urban, peri-urban, rural), and zones of the
  country.
• Examine patterns of achievement.
• Compare achievements of grade 3 and grade 6 pupils in 1999 and
  2005.

Frequency. Uganda has carried out national assessments of educational
achievement since 1996. Initially, pairs of subjects (literacy and numer-
acy; science and social studies) were assessed on a three-yearly basis.
From 2003, the focus has been on literacy and numeracy, which are
assessed annually.

Grades. 3 and 6.

Achievements assessed. English literacy and numeracy. Oral fluency in
English is assessed every three years.

Instruments. Achievement tests in literacy and numeracy. Earlier
national assessments used pupil, teacher, and principal question-
naires. Assessments that collect questionnaire data are administered
every three years.

Who did it? Uganda National Examinations Board (UNEB).

Sample or population. Initially districts within each of the country’s
14 zones were sampled. The sample size was increased to ensure a
minimum of three schools within each district.

Analysis. Pupils’ scores on each test were assigned to one of four levels:
“advanced,” “adequate,” “basic,” and “inadequate.” Scores correspond-
ing to levels were determined and set when tests were being constructed
by panels of officials from the National Curriculum Development
Centre, Primary Teachers’ Colleges, Education Standards Agency,
UNEB, and teaching professions. On the 50-item grade 3 English
                                                  COUNTRY CASE STUDIES   |    105




test, the following score ranges were used to define levels of perfor-
mance: 38–50 “advanced,” 20–37 “adequate,” 15–19 “basic,” and 0–14
“inadequate.” The panels decided that the adequate level was to be con-
sidered the minimum “desired” level of proficiency. Fewer than 40
percent of grade 3 students attained the desired proficiency level in
English (table A.9.1). Achievement test results were reported (in
percentage terms) according to pupils’ age, school location (urban or
rural), geographical region, and zone.

Use of results. UNEB printed a poster for each grade 3 and 6 classroom
in Uganda, listing curriculum areas where national-level student perfor-
mance was considered adequate (for example, “We can count num-
bers,” or “We can carry out addition and subtraction of numbers written
in figures and symbols”) and less than adequate (for example, “Help us
to develop a wider vocabulary,” or “Help us to carry out division of
numbers correctly,” or “Help us to solve word problems in math”). It
has prepared a similar poster for teachers.
   UNEB has plans to disseminate key lessons learned from the 2005
NAPE in the form of separate user-friendly reports of the implications
of NAPE for teachers, head teachers, supervisors and inspectors,
teacher educators, and policy makers. It is also designing a pilot initia-
tive to use national assessment approaches to help improve classroom-
based assessment.

Interesting points. The vast majority of students had to take the tests
in their second language. Finding a commonly used language in which
to give a test would be very difficult. More than one-quarter of pri-
mary schools could not be included in the national assessment, in part
because of civil unrest in particular regions. UNEB found that schools


 TABLE A.9.1
 Percentages of Uganda Grade 3 Pupils Rated Proficient in English Literacy,
 2005
 Rating                                  Boys (%)      Girls (%)    All (%)
 Proficient (advanced + adequate)           36.9           39.7       38.3
 Below desired proficiency level
 (basic + inadequate)                      63.1           60.3       61.7
 Source: UNEB 2006, table 3.02.
106         |     ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




occasionally inflated their enrollment data to increase their levels of
resource allocation.
   Many of the language items tested came under the general heading
of “grammar” (50 percent for third grade and 30 percent for sixth
grade). In general, students found test items difficult. Many students
obtained relatively low scores (see figure A.9.1). Although the typical
grade 3 student was expected to be about 8 to 9 years of age, the actual
average age of the pupils who sat for the grade 3 test was 10.2 years;
some were 11 years of age and older.
   Substantial achievement differences were found by zonal area. A
total of 87.5 percent of grade 6 students in the Kampala zone achieved
the desired proficiency level in English literacy. The corresponding per-
centage for each of six other zones was less than 30. Performance on
the grade 6 writing subtest revealed substantial differences between
expected and actual levels of performance. Roughly half the students
achieved the desired proficiency level in writing a story about a picture,
one-quarter in writing a letter, and one-tenth in composing and writing
a story. The technical report includes a sample of student letter writing

 FIGURE A.9.1

 Grade 6 Literacy Test Score Distribution in Uganda

                  140


                  120


                  100
      frequency




                   80


                   60


                  40


                  20


                   0
                        0   5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
                                                 score

Source: Clarke 2005.
                                               COUNTRY CASE STUDIES   |   107




and lists of common mistakes in the mathematics tests. It also includes
a series of recommendations and lists the agency or unit that should
bear responsibility for following up on recommendations.
    UNEB recruited the services of an external consultant to review the
quality of its work, specifically the quality of the statistical characteris-
tics of its items and the match between the selected items and curricu-
lum objectives. The consultant noted a close match between the items
and curriculum but recommended that more attention be devoted to
problem solving in mathematics. The consultant’s work was somewhat
limited by the nonavailability of information on earlier national assess-
ments relating to test development, sample weights, design, and analy-
sis. Some of the problems stemmed from the fact that some NAPE
analytical work had been contracted to a body outside UNEB. The
consultant recommended that copies of all instruments, details of sam-
pling analytical procedures, and other relevant documentation be kept
on file by the national assessment agency (UNEB).

Source: UNEB 2006.
                   APPENDIX B

                   INTERNATIONAL STUDIES



B.1. TRENDS IN INTERNATIONAL MATHEMATICS
AND SCIENCE STUDY

Framework

The central aims of the Trends in International Mathematics and Sci-
ence Study (TIMSS) organized by the International Association for
the Evaluation of Educational Achievement (IEA) were as follows:
• Assess student achievements in mathematics and science, described
  in terms of concepts, processes, skills, and attitudes.
• Describe the context in which student achievement develops, with a
  view to identifying factors related to student learning that might be
  manipulated through policy changes (relating, for example, to cur-
  ricular emphasis, allocation of resources, or instructional practices).
   Three TIMSS studies have been carried out: the first in 45 education
systems in 1994–95 in three populations (grades 3 and 4; grades 7 and
8; last year of secondary school); the second in 38 education systems in
1999 in grade 8; and the third in grades 4 and 8 in 50 systems in 2003.
Additional studies are scheduled for 2007, 2008 (last year of secondary
school only), and 2011.


                                                                     109
110   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




   TIMSS distinguishes between the intended, the implemented, and
the attained curriculum and, in analyses, explores how they are
interrelated. The intended curriculum represents a statement of society’s
goals for teaching and learning that are typically described in curricula,
syllabi, policy statements, and regulations and are reflected in textbooks,
resources, and examinations. The implemented curriculum is how the
intended curriculum is interpreted by teachers and is made available to
students. Data on implementation (which provides an index of stu-
dents’ opportunity to learn) are collected mainly through questionnaires
administered to teachers and students. The attained curriculum is what
students have learned, as inferred from their performance on tests.


Instrumentation

The following mathematics components are assessed in TIMSS tests:

• Content. Numbers; measurement; geometry; proportionality; func-
  tions, relations, and equations; data, probability, statistics; elementary
  analysis; and validation and structure.
• Performance expectations. Knowing, using routine procedures,
  investigating and problem solving, mathematical reasoning, and
  communicating.
• Perspectives. Attitudes, careers, participation, increasing interest,
  and habits of mind.

The science components of TIMSS comprise the following:

• Content. Earth science; life sciences; physical sciences; science, tech-
  nology, mathematics; history of science; environmental issues; nature
  of science; and science and other disciplines.
• Performance expectations. Understanding; theorizing, analyzing, solv-
  ing problems; using tools, routine procedures, and science processes;
  investigating the natural world; and communication.
• Perspectives. Attitudes, careers, participation, increasing interest,
  safety, and habits of mind.

  Since its inception, TIMSS has modified its frameworks to reflect
curricular and pedagogical changes in participating countries. The
                                                              INTERNATIONAL STUDIES      |   111




TIMSS designers used a curriculum framework that is based on
earlier studies (in particular, the Second International Mathematics
Study in the case of mathematics) to develop tests through a
consensus-building process among participating countries. Several
hundred items (multiple choice and constructed response) were
piloted and evaluated for appropriateness and curriculum fit. Maxi-
mum curriculum coverage was attained without placing too great a
burden on the students who took part in the study by distributing
test items across booklets. Each student responded to only one book-
let. Table B.1.1 presents an example from the curriculum frame-
work for the TIMSS 2007 assessment.
   Questionnaires were constructed and administered to obtain infor-
mation on the following:
• General social and educational contexts (system level )
• Local, community, and school contexts (school level )
• Personal background factors (individual student level ).
   Instruments were translated into more than 30 languages.

TABLE B.1.1
Target Percentages of the TIMSS 2007 Mathematics Tests Devoted to
Content and Cognitive Domains, Fourth and Eighth Grades
 Fourth-Grade Content Domains                                        Percentages
 Number                                                                   50
 Geometric Shapes and Measures                                            35
 Data Display                                                             15
 Eighth-Grade Content Domains                                        Percentages
 Number                                                                   30
 Algebra                                                                  30
 Geometry                                                                 20
 Data and Chance                                                          20
 Cognitive Domains                                                   Percentages
                                                       Fourth Grade            Eighth Grade

 Knowing                                                        40                  35
 Applying                                                       40                  40
 Reasoning                                                      20                  25
 Source: Mullis and others 2005, exhibit 2. Reproduced with permission.
112   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Participants

Three populations participated in the original TIMSS in 1994–95:

• Population 1. Students in the pair of adjacent grades that contained the
  most students who were nine years of age (typically grades 3 and 4).
• Population 2. Students in the pair of adjacent grades that contained the
  most students who were 13 years of age (typically grades 7and 8).
• Population 3. Students in the last year of secondary school. Two
  subpopulations were identified: (a) all students who took a math-
  ematics and literacy test, and (b) students who were specializing
  in either mathematics or physics who took a specialized test.

   In 1994–95, 45 education systems participated in TIMSS (Popula-
tions 1, 2, and 3). Among them, one was African (South Africa); six
were in Asia/Middle East (Hong Kong, China; the Islamic Republic of
Iran; Israel; Japan; the Republic of Korea; Kuwait; Singapore; and Thai-
land); and one was in Latin America and the Caribbean (Colombia).
The names of education systems in this appendix are those listed in
reports of the studies.
   In 1999, 38 education systems participated in TIMSS (Population 2).
Among them, three were in Africa (Morocco, South Africa, and Tuni-
sia); 13 in Asia/Middle East (Chinese Taipei; Hong Kong, China; Indo-
nesia; the Islamic Republic of Iran; Israel; Japan; Jordan; the Republic of
Korea; Malaysia; the Philippines; Singapore; Thailand, and Turkey);
and 2 in Latin America and the Caribbean (Argentina and Chile).
   Fifty participated in TIMSS 2003 (Populations 1 and 2). Among
them were 6 in Africa (Botswana; the Arab Republic of Egypt; Ghana;
Morocco; South Africa; and Tunisia); 17 in Asia/Middle East (Bahrain;
Chinese Taipei; Hong Kong, China; Indonesia; the Islamic Republic of
Iran; Israel; Japan; Jordan; the Republic of Korea; Lebanon; Malaysia;
Palestine; the Philippines; Saudi Arabia; Singapore; the Syrian Arab
Republic; and the Republic of Yemen); and 1 in Latin America and the
Caribbean (Chile).


Some Findings

Table B.1.2 presents results for the 2003 grade 8 mathematics test.
Roughly one-third of the students in the highest-performing systems
                                                                                                 INTERNATIONAL STUDIES                          |       113



TABLE B.1.2
TIMSS Distribution of Mathematics Achievement, Grade 8

                                     Years                                                                                         Human
                                                 Average                                                           Average
               Countries               of                        Mathematics Achievements Distribution                           Development
                                                   Age                                                            Scale Score
                                   Schooling*                                                                                      index**

             Singapore                   8        14.3                                                            605 (3.6)          0.884
             Korea, Rep. of              8        14.6                                                            589 (2.2)          0.879
          †
             Hong Kong, SAR              8        14.4                                                            586 (3.3)          0.889
             Chinese Taipel              8        14.2                                                            585 (4.6)            –
             Japan                       8        14.4                                                            570 (2.1)          0.932
             Belgium (Flemish)           8        14.1                                                            537 (2.8)          0.937
          †
             Netherlands                 8        14.3                                                            536 (3.8)          0.938
             Estonia                     8        15.2                                                            531 (3.0)          0.833
             Hungary                     8        14.5                                                            529 (3.2)          0.837
             Malaysia                    8        14.3                                                            508 (4.1)          0.790
             Latvia                      8        15.0                                                            508 (3.2)          0.811
             Russian Federation       7 or 8      14.2                                                            508 (3.7)          0.779
             Slovak Republic             8        14.3                                                            508 (3.3)          0.836
             Australia                8 or 9      13.9                                                            505 (4.6)          0.939
          ‡
             United States               8        14.2                                                            504 (3.3)          0.937
         1
             Lithuania                   8        14.9                                                            502 (2.5)          0.824
             Sweden                      8        14.9                                                            499 (2.6)          0.941
         1
             Scotland                    9        13.7                                                            498 (3.7)          0.930
          2
             Israel                      8        14.0                                                            496 (3.4)          0.905
             New Zealand             8.5 - 9.5    14.1                                                            494 (5.3)          0.917
             Slovenia                 7 or 8      13.8                                                            493 (2.2)          0.881
             Italy                       8        13.9                                                            484 (3.2)          0.916
             Armenia                     8        14.9                                                            478 (3.0)          0.729
         1
             Serbia                      8        14.9                                                            477 (2.6)            –
             Bulgaria                    8        14.9                                                            476 (4.3)          0.795
             Romania                     8        15.0                                                            475 (4.8)          0.773
            International Avg.           8        14.5                                                            467 (0.5)            –
             Norway                      7        13.8                                                            461 (2.5)          0.944
             Moldova, Rep.of             8        14.9                                                            460 (4.0)          0.700
             Cyprus                      8        13.8                                                            459 (1.7)          0.891
         2
             Macedonia, Rep. of          8        14.6                                                            435 (3.5)          0.784
             Lebanon                     8        14.6                                                            433 (3.1)          0.752
             Jordan                      8        13.9                                                            424 (4.1)          0.743
             Iran, Islamic Rep. of       8        14.4                                                            411 (2.4)          0.719
         1
             Indonesia                   8        14.5                                                            411 (4.8)          0.682
             Tunisia                     8        14.8                                                            410 (2.2)          0.740
             Egypt                       8        14.4                                                            406 (3.5)          0.648
             Bahrain                     8        14.1                                                            401 (1.7)          0.839
             Palestinlan Nat’l Auth.     8        14.1                                                            390 (3.1)          0.731
             Chile                       8        14.2                                                            387 (3.3)          0.831
         1‡
             Morocco                     8        15.2                                                            387 (2.5)          0.606
             Philippines                 8        14.8                                                            378 (5.2)          0.751
             Botswana                    8        15.1                                                            366 (2.6)          0.614
             Saudi Arabia                8        14.1                                                            332 (4.6)          0.769
             Ghana                       8        15.5                                                            276 (4.7)          0.567
             South Africa                8        15.1                                                            264 (5.5)          0.684
         ¶
             England                     9        14.3                                                            498 (4.7)          0.930
       Benchmarking Participants
             Basque Country, Spain       8        14.1                                                            487 (2.7)            –
             Indiana State, US           8        14.5                                                            508 (5.2)            –
             Ontario Province, Can.      8        13.8                                                            521 (3.1)            –
             Quebec Province, Can.       8        14.2                                                            543 (3.0)            –
                                                           0   100    200   300   400    500   600    700   800

                                                                                                                    Country average significantly
                                                                       Percentiles of Performance
                                                                                                                    higher than international average
                                                                     5th    25th          35th 95th
                                                                                                                    Country average significantly
                                                               95% Confidence Interval for Average (±25E)           lower than international average


*    Represents year of schooling counting from the first 1                          National Desired Population does not cover all of
     year of ISCED level 1.                                                         International Desired Population (see Exhibit A.6).
**   Taken from United Nations Development Programme’s 2                            National Defined Population covers less than 90% of
     Human Development Report 2003, p. 237–240.                                     International Desired Population (see Exhibit A.6).
†    Met guidelines for sample participation rates only                             Korea tested the same cohort of student as other
                                                                              ▼
                                                                              ▼




     after replacement schools were included (see                                   countries, but later in 2003, at the beginning of the next
     Exhibit A.9).                                                                  school year.
‡    Nearly satisfied guidelines for sample participation ( )                        Standard errors appear in parentheses. Because
     rates only after replacement schools were included                             results are rounded to the nearest whole number,
     (see Exhibit A.9).                                                             some totals may appear inconsistent.
¶    Did not satisfy guidelines for sample participation                      A dash (-) indicates comparable data are not available.
     rates (see Exhibit A.9).

Source: Mullis and others 2004, exhibit 1.1. Reproduced with permission.
114   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




scored at the advanced benchmark level. In sharp contrast, 19 of the
lowest-scoring systems recorded 1 percent or fewer students at this
benchmark level. Singapore was ranked first at both fourth and
eighth grade on the test. Some systems demonstrated significantly
higher average achievement compared with their performances in
1995 and 1999, whereas others experienced significant score declines.
The Republic of Korea; Hong Kong, China; Latvia; Lithuania; and
the United States were among those that improved at grade 8.
   Overall, gender differences in mathematics achievement were neg-
ligible. Girls, however, outperformed boys in some systems, while
boys did better in other systems. A high level of parental education
was associated with higher achievement scores in virtually all sys-
tems. At both fourth and eighth grades in the 2003 study, the number
of books in the home correlated significantly with students’ mathe-
matics achievement.
   The extent of coverage of the curriculum tested in TIMSS 2003 var-
ied across systems. Teachers’ reports on grade 8 students indicated that,
on average, 95 percent had been taught number topics, 78 percent
measurement topics, 69 percent geometry topics, 66 percent algebra
topics, and 46 percent data topics. More than 80 percent of students
were taught by teachers who had at least some professional training in
mathematics. Textbooks were widely used as the foundation for teach-
ing. Calculator usage, in contrast, varied greatly from system to system.
Widespread use in grade 4 was permitted in only five systems. Schools
that had few students living in economically disadvantaged homes
scored on average 57 points higher in grade 8 and 47 points higher in
grade 4 than schools in which more than half the students came from
disadvantaged homes.



B.2. PROGRESS IN INTERNATIONAL READING
LITERACY STUDY

Framework

IEA’s 1991 Reading Literacy Study served as the basis for the definition
of reading literacy in the Progress in International Reading Literacy
                                            INTERNATIONAL STUDIES   |   115




Study (PIRLS). For PIRLS (both 2001 and 2006), reading literacy was
defined as

  . . . the ability to understand and use those written language forms
  required by society, and/or valued by the individual. Young read-
  ers can construct meaning from a variety of texts. They read to
  learn, to participate in communities of readers, and for enjoyment
  (IEA 2000, 3).

   The assessment framework for PIRLS comprises two major reading
purposes crossed with four processes of comprehension. The purposes
are the following:

• Literary. Reading for literary experience in which the reader engages
  with text to become involved in imagined events and characters, and
  to enjoy language itself.
• Informational. Reading to acquire and use information, in which the
  reader engages with aspects of the real world represented either in
  chronological texts (for example, when events are described in biog-
  raphies, recipes, and instructions) or in nonchronological text, in
  which ideas are organized logically rather than chronologically (for
  example, in discussion or persuasion texts).

The processes of comprehension require students to do the following:

• Focus on and retrieve explicitly stated information. For example,
  look for specific ideas; find the topic sentence or main idea when
  explicitly stated.
• Make straightforward inferences. For example, infer that one event
  caused another; identify generalizations in the text.
• Interpret and integrate ideas and information. For example, discern
  the overall message or theme of a text; compare and contrast text
  information.
• Examine and evaluate content, language, and textual elements. Describe
  how the author devised a surprise ending; judge the completeness or
  clarity of information in the text.

PIRLS was carried out in 2001 and 2006.
116   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Instruments

It was estimated that using “authentic” texts (that is, ones typical of
those read by students in their everyday experiences) for each pur-
pose (reading for literary experience and reading to acquire and use
information) would require four hours of testing time. Because ex-
pecting any individual student to sit for more than one hour in a
test situation did not seem reasonable, the assessment material was
distributed across 10 booklets, only one of which was responded to
by each individual student.
   Students’ ability in each of the four comprehension processes was
assessed in questions that accompanied texts. Two formats were used:
multiple choice and constructed response.
   Information on students’ attitudes to reading and on their reading
habits was obtained in a questionnaire. Questionnaires were also
administered to students’ parents, teachers, and school principals to
gather information about students’ home and school experiences that
were considered relevant to the development of reading literacy.


Participants

The target population for PIRLS was defined as the upper of the two
adjacent grades with the most nine-year-olds. In most systems, this was
the fourth grade.
   Thirty-five education systems participated in PIRLS in 2001.
They included one in Africa (Morocco); six in Asia/Middle East
(Hong Kong, China; the Islamic Republic of Iran; Israel; Kuwait;
Singapore; and Turkey); and three in Latin America and the
Caribbean (Argentina, Belize, and Colombia) (Mullis and others
2003). Forty-one systems participated in PIRLS 2006. The number
from Africa increased by one (with the addition of South Africa).
The number of Asian/Middle Eastern countries increased by two
(with the addition of Chinese Taipei, Indonesia, and Qatar but with
Turkey dropping out). One Latin American and Caribbean system
participated (Trinidad and Tobago joined, while the three that had
participated in 2001 did not participate).
   PIRLS is scheduled for administration again in 2011.
                                           INTERNATIONAL STUDIES   |   117




Some Findings

Four benchmarks were created on the basis of students’ test scores.
Those benchmarks were the lower quarter benchmark, defined as
the 25th percentile (the point above which the top 75 percent of
students scored); the median benchmark, defined as the 50th
percentile; the upper quarter benchmark, defined as the 75th per-
centile; and the top 10 percent benchmark, defined as the 90th
percentile. If reading achievement scores were distributed in the
same way in each country, approximately 10 percent of students in
each country would be ranked in the top benchmark. Table B.2.1
presents the results for participating countries. It shows, for
example, that 24 percent of English students scored in the highest
category and that 10 systems had fewer than 5 percent of students
in this category.
    Girls recorded significantly higher mean scores than boys in all
systems. On the items that measured reading for informational pur-
poses, students in Sweden, the Netherlands, and Bulgaria scored
highest. Early literacy activities before commencing school, such as
reading books and telling stories, were positively related to later
reading performance. Higher reading achievement scores were ob-
tained by the children of parents who had favorable attitudes to
reading. Students who spoke the language used in the assessment at
home tended to have higher scores than students who spoke other
languages. Principals’ responses indicated that reading was empha-
sized across systems more than any other curriculum area in grades
1 to 5.
    Teachers, on average, said that they asked the majority of fourth
graders to read aloud to the whole class daily. They made relatively
little use of libraries, even though libraries tended to be available.
On average, most teachers relied on their own assessments rather
than on objective tests when monitoring student progress. Almost
two out of every three students said that they read stories or novels
at least once a week. Across all systems, students’ attitudes to read-
ing were positively related to reading achievement.
118     |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




TABLE B.2.1
Percentages of Students Reaching PIRLS Benchmarks in Reading Achivement,
Grade 4

                                                                                                             Upper             Lower
                                                        Percentages of Students Reaching          Top 10%            Median
                  Countries                                                                                 Quarter Benchmark Quarter
                                                            International Benchmarks             Benchmark
                                                                                                           Benchmark         Benchmark
             ** England                                                                           24 (1.6    45 (1.9)    72 (1.6)    90 (1.0)
                  Bulgaria                                                                        21 (1.3)   45 (1.9)    72 (1.9)    91 (1.1)
                  Sweden                                                                          20 (1.1)   47 (1.4)    80 (1.3)    96 (0.5)
              * United States                                                                     19 (1.3)   41 (2.0)    68 (2.0)    89 (1.2)
                  New Zealand                                                                     17 (1.4)   35 (1.7)    62 (1.9)    84 (1.3)
              1
                  Canada (O.Q)                                                                    16 (1.0)    37 (1.3)   69 (1.3)    93 (0.6)
                  Singapore                                                                       15 (1.5)   35 (2.3)    64 (2.3)    85 (1.6)
              * Netherlands                                                                       14 (1.0)   40 (1.7)    79 (1.5)    98 (0.5)
                  Italy                                                                           14 (1.0)   36 (1.3)    69 (1.5)    92 (0.8)
              * Scotland                                                                          14 (1.1)    32 (1.8)   62 (1.8)    87 (1.1)
                  Hungary                                                                         13 (0.9)   36 (1.5)    71 (1.2)    94 (0.6)
              1
                  Lithuania                                                                       13 (1.4)   36 (1.7)    71 (1.7)    95 (0.6)
                  Latvia                                                                          12 (1.1)    36 (1.6)   73 (1.5)    96 (0.6)
                  Germany                                                                         12 (0.8)    34 (1.3)   69 (1.2)     93 (0.6)
              2
                  Israel                                                                          11 (0.8)   28 (1.2)    54 (1.4)     79 (1.1)
                  Romania                                                                         11 (1.3)   27 (2.0)    54 (2.1)    81 (1.7)
                  Czech Republic                                                                  10 (0.9)   32 (1.5)    68 (1.5)    93 (0.7)
              2
                  Greece                                                                          10 (0.8)   28 (2.0)    60 (2.2)    89 (1.2)
                  France                                                                           9 (0.9)   26 (1.2)    60 (1.4)    90 (0.9)
              2
                  Russian Federation                                                               8 (1.0)   27 (2.1)    64 (2.3)    92 (1.6)
                  Slovak Republic                                                                  7 (1.0)    23 (1.4)   59 (1.7)    88 (1.1)
                  Iceland                                                                          7 (0.6)    23 (1.0)   53 (1.0)    85 (0.8)
                  Hong Kong, SAR                                                                   6 (0.7)   26 (1.7)    64 (1.9)    92 (1.1)
                  Norway                                                                           6 (0.9)   19 (1.2)    48 (1.4)    80 (1.4)
                  Cyprus                                                                           6 (0.8)   18 (1.3)    45 (1.6)    77 (1.4)
                  Slovenia                                                                         4 (0.5)    17 (1.0)   48 (1.2)     83 (0.9)
                  Moldova, Rep.of                                                                  4 (0.9)    15 (1.8)   42 (2.5)     79 (1.7)
                  Macedonia, Rep. of                                                               3 (0.4)    10 (0.9)   28 (1.5)    55 (2.1)
                  Turkey                                                                           2 (0.3)    7 (0.9)    25 (1.6)    58 (1.7)
                  Argentina                                                                        2 (0.4)    5 (0.8)    17 (1.6)    46 (2.5)
                  Iran, Islamic Rep. of                                                            1 (0.2)     4 (0.5)   16 (1.4)    42 (1.9)
                  Colombia                                                                         1 (0.4)    3 (0.8)    14 (1.5)    45 (2.4)
              2
                  Morocco                                                                          1 (0.9)    3 (1.4)      8 (2.1)   23 (3.0)
                  Kuwait                                                                           0 (0.1)     2 (0.4)   10 (1.1)    36 (2.0)
                  Belize                                                                           0 (0.2)     1 (0.4)    5 (0.6)    16 (1.3)




              Ontario (Canada)                                                                    19 (1.4)   40 (1.8)    70 (1.6)    92 (0.8)
              Quebec (Canada)                                                                     11 (1.0)   31 (1.8)    67 (2.0)    94 (0.8)

                                          0             25             50            75    100

                                                                                                    Top 10% Benchmark (90th Percentile) = 615
                                                                                            Upper Quarter Benchmark (75th Percentile) = 570
                                          Percentage     Percentage    Percentage                    Median Benchmark (50th Percentile) = 510
                                          of students    of students   of students
                                          at or above    at or above   at or above           Lower Quarter Benchmark (25th Percentile) = 435
                                          Top 10%        Upper         Median
                                          Benchmark      Quarter       Benchmark
                                                         Benchmark


*   Canada is represented by the provinces of Ontario 2a National Defined Population covers less than 95% of
    and Quebec only. The international average does not        International Desired Population (see Exhibit A.4).
    include the results from these provinces separately. 2b National Defined Population covers less than 80% of
† Met guidelines for sample participation rates only           International Desired Population (see Exhibit A.4).
    after replacement schools were included (see           ( ) Standard errors appear in parentheses. Because
    Exhibit A.7).                                              results are rounded to the nearest whole number,
‡ Nearly satisfying guidelines for sample participation        some totals may appear inconsistent.
    rates after replacement schools were included (see
    Exhibit A.7).
¶ National Desired Population does not cover all of
    International Desired Population. Because coverage
    falls below 65%, Canada is annotated Canada (O, Q)
    for the province of Ontario and Quebec only.
Source: Mullis and others 2004, exhibit 1.1. Reproduced with permission.
                                              INTERNATIONAL STUDIES   |   119




B.3. PROGRAMME FOR INTERNATIONAL STUDENT
ASSESSMENT

Framework

The Programme for International Student Assessment (PISA) assesses
the knowledge and skills of 15-year-old students at three-year intervals
under the auspices of the Organisation for Economic Co-operation and
Development (OECD). PISA was developed to provide regular indica-
tors of students’ achievement near the end of compulsory schooling for
the OECD International Indicators of Education Systems.
    Students are assessed in three domains: reading, mathematics, and
science. To date, three PISA assessments have been carried out. In
2000, reading was the major domain assessed, with mathematics and
science as minor domains. In 2003, mathematics was the major do-
main; reading and science were minor domains. In 2006, science was
the major domain; reading and mathematics were minor domains.
    PISA is designed to be used by individual countries (a) to gauge the
literacy skills of students in comparison with students in participating
countries, (b) to establish benchmarks for educational improvement in
terms of the performance of students in other countries, and (c) to assess
their capacity to provide high levels of equity in educational opportuni-
ties and outcomes. PISA attempts to assess the extent to which students
near the end of compulsory education have acquired some of the knowl-
edge and skills that are essential for full participation in society.


Participants

In 2000, 32 countries participated in PISA. Two years later, 11 more
countries took the PISA 2000 assessment tasks. No African country
participated in the 2000 assessment. Asian/Middle Eastern partici-
pants included two OECD countries (Japan and the Republic of
Korea) and five non-OECD “partner” countries (Hong Kong, China;
Indonesia; Israel; Russian Federation; and Thailand). Systems in Latin
America and the Caribbean included Mexico as well as the following
non-OECD countries: Argentina, Brazil, Chile, and Peru. All 30 OECD
member states and a further 11 “partner” systems took part in 2003.
120   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




Among the new partner systems, one was in Africa (Tunisia); one in
Asia (Macao, China); and one in Latin America and the Caribbean
(Uruguay). Three original partner systems (Argentina, Chile, and
Peru) did not participate in the 2003 assessment. Turkey, an OECD
country, participated for the first time in 2003. By 2006, the number
of participating systems had risen to 57. Tunisia remained the only
participating African systems. New partner systems in Asia/Middle
East included Azerbaijan, Chinese Taipei, Jordan, Kyrgyzstan, and
Qatar. Latin American systems that had participated in either the
2000 or the 2003 assessment took the 2006 PISA tests, as did one
new partner systems (Colombia).
   The population of interest is 15-year-old students. They are sampled
at random across grade levels in participating schools.


Instruments

The Reading Literacy test assumes that students are technically able
to read and attempts to assess their ability to understand and reflect
on a wide range of written materials in different situations. Three
dimensions are identified: the content or structure of texts (continuous,
such as narrative and descriptive, and noncontinuous, such as tables,
charts, and forms); the processes that need to be performed (retrieval,
interpretation, reflection, and evaluation); and the situation in which
knowledge and skills are drawn on or applied (personal, public,
occupational, and educational).
   The Mathematical Literacy test is concerned with the capacity
of students to analyze, reason, and communicate ideas as they for-
mulate, solve, and interpret mathematical problems in a variety of
contexts. Three dimensions are distinguished in the mathematical
framework: content (space and shape, change and relationships,
quantity, and uncertainty); competencies (the reproduction cluster,
the connections cluster, and the reflection cluster); and situations
(personal, educational, or occupational, public, and scientific). Test
items tend more toward “real life” situations than is normally the
case in conventional achievement tests (see figure B.3.1)
   The Scientific Literacy test assesses students’ ability to draw
appropriate conclusions from evidence and information given to
                                                     INTERNATIONAL STUDIES   |   121




 FIGURE B.3.1

 Sample of PISA Mathematics Items

  CARPENTER

  A carpenter has 32 metres of timber and wants to make a border
  around a garden bed. He is considering the following designs for the
  garden bed.
         A                                       B



                     6m                                     6m



                    10 m                                   10 m


         C                                       D


                     6m                                     6m




                    10 m                                   10 m



                                       Question 1
                    Circle either “Yes” or “No” for each design to
                 indicate whether the garden bed can be made with
                                  32 metres of timber
                  Garden
                   bed         Using this design, can the garden bed
                  design       be made with 32 metres of timber?
                  Design A     Yes / No
                  Design B     Yes / No
                  Design C     Yes / No
                  Design D     Yes / No


Source: OECD 2003. Reproduced with permission.

them, to criticize claims on the basis of evidence, and to distinguish
opinion from evidence-based statements. The framework for science
comprises three dimensions: scientific concepts (selected from phys-
ics, chemistry, biological science, and earth and space science);
122   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




processes (describing, explaining, and predicting scientific phenomena;
understanding scientific investigation; and interpreting scientific evi-
dence and conclusions); and application (in life and health; in earth
and environment; in technology).
   Having many more test items than an individual student could com-
plete ensures adequate coverage of the domains of interest. Test items
are spread across 13 booklets that consist of various combinations of
mathematics, reading, science, and problem solving.
   Questionnaires were administered to students (to obtain informa-
tion on their engagement with learning, their learning strategies, and
beliefs about themselves; their perception of the learning environment;
and their home background) and to the principals of schools (to obtain
information on school policies and practices and the quality of avail-
able resources) (OECD 2004b).


Some Findings

PISA reports the mean scores of countries in a “league table” (figure
B.3.2). It also categorizes student performance by proficiency level
based on what test scores indicate students can typically do. Figure
B.3.3 describes the skills associated with each of six PISA proficiency
levels for mathematics. The following figure (figure B.3.4) summarizes
how students in each country performed by proficiency level.
   The results indicate very considerable differences between countries
such as Finland, the Republic of Korea, and Canada, where the major-
ity of students score above Level 2, and Brazil, Tunisia, and Indonesia,
where a small minority achieve this level of proficiency. Other findings
show that less than 5 percent of students in OECD countries achieved
Level 6, while about one-third were able to perform the tasks associ-
ated with Levels 4, 5, and 6. Eleven percent of students were not able
to perform the Level 1 mathematics tasks. In most countries, males
tended to score higher than females, especially in tasks associated with
space and shape. In some countries (Australia, Austria, Japan, the
Netherlands, Norway, and Poland), gender differences in achievement
were not significant. Females tended to have a lower interest in—and
enjoyment of—mathematics, and they claimed to experience more
stress than males in this curriculum area. U.S. students tended to have
      FIGURE B.3.2

      PISA Mean Reading Literacy Scores and Reading Subscale Scores, 2000

                                                                                                                 READING SUBSCALES
         Combined reading literacy score                         Retrieving Information                              Interpreting texts                                  Reflecting on texts
           Country                    Average                 Country                 Average                 Country                      Average                Country                   Average
           Finland                      546                   Finland                   556                   Finland                       555                   Canada                      542
           Canada                       534                   Australia                 536                   Canada                        532                   United Kingdom              539
           New Zealand                  529                   New Zealand               535                   Australia                     527                   Ireland                     533
           Australia                    528                   Canada                    530                   Ireland                       526                   Finland                     533
           Ireland                      527                   Korea, Republic of        530                   New Zealand                   526                   Japan                       530
           Korea, Republic of           525                   Japan                     526                   Korea, Republic of            525                   New Zealand                 529
           United Kingdom               523                   Ireland                   524                   Sweden                        522                   Australia                   526
           Japan                        522                   United Kingdom            523                   Japan                         518                   Korea, Republic of          526
           Sweden                       516                   Sweden                    516                   Iceland                       514                   Austria                     512
           Austria                      507                   France                    515                   United Kingdom                514                   Sweden                      510
           Belgium                      507                   Belgium                   515                   Belgium                       512                   United States               507
           Iceland                      507                   Norway                    505                   Austria                       508                   Norway                      506
           Norway                       505                   Austria                   502                   France                        506                   Spain                       506
           France                       505                   Iceland                   500                   Norway                        505                   Iceland                     501
           United States                504                   United States             499                   United States                 505                   Denmark                     500
           Denmark                      497                   Switzerland               498                   Czech Republic                500                   Belgium                     497
           Switzerland                  494                   Denmark                   498                   Switzerland                   496                   France                      496
           Spain                        493                   Italy                     488                   Denmark                       494                   Greece                      495
           Czech Republic               492                   Spain                     483                   Spain                         491                   Switzerland                 488
           Italy                        487                   Germany                   483                   Italy                         489                   Czech Republic              485
           Germany                      484                   Czech Republic            481                   Germany                       488                   Italy                       483
           Hungary                      480                   Hungary                   478                   Poland                        482                   Hungary                     481
           Poland                       479                   Poland                    475                   Hungary                       490                   Portugal                    480
           Greece                       474                   Portugal                  455                   Greece                        475                   Germany                     478
           Portugal                     470                   Greece                    450                   Portugal                      473                   Poland                      477
           Luxembourg                   441                   Luxembourg                433                   Luxembourg                    446                   Mexico                      446
           Mexico                       422                   Mexico                    402                   Mexico                        419                   Luxembourg                  442
           OECD average                 500                   OECD average              498                   OECD average                  501                   OECD average                502

           Non-OECD countries                                 Non-OECD countries                              Non-OECD countries                                  Non-OECD countries
           Liechtenstein                483                   Liechtenstein             492                   Liechtenstein                 484                   Liechtenstein               468
           Russian Federation           462                   Latvia                    451                   Russian Federation            468                   Latvia                      458
           Latvia                       458                   Russian Federation        451                   Latvia                        459                   Russian Federation          455
           Brazil                       396                   Brazil                    365                   Brazil                        400                   Brazil                      417


      NOTE: Although the Netherlands participated in the Programme for International Student Assessment (PISA) in 2000, technical problems with its sample prevent its results from being discussed
      here. For information on the results for the Netherlands, see OECD (2001). The OECD average is the average of the national averages of 27 OECD countries. Because PISA is principally an OECD
      study, the results for non-OECD countries are displayed separately from those of the OECD countries and not included in the OECD average.

      SOURCE: Organization for Economic Cooperation and Development, Program for International Student Assessment (PISA) 2000.

              Average is significantly higher than the U.S. average         Average is not significantly different from the U.S. average             Average is significantly lower than the U.S. average
123




      Source: OECD 2001, figure 3. Reproduced with permission.
124    |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




 FIGURE B.3.3

 Student Proficiency Levels in PISA Mathematics




                             What students can typically do
  Score            Level     At level 6 students can conceptualize, generalize, and utilize information
  points                     based on their investigation and modeling of complex problem situations.
                             They can link different information sources and representations and flexibly
                             translate among them. Students at this level are capable of advanced
                             mathematical thinking and reasoning. These students can apply insight and
                             understanding along with a mastery of symbolic and formal mathematical
                             operations and relationships to develop new approaches and strategies for
                             dealing with novel situations. Students at this level can formulate and precisely
                             communicate their actions and reflections regarding their findings, interpreta-
                             tions, arguments and the appropriateness of these to the original situations.
           668
                   Level     At level 5 students can develop and work with models for complex situations,
                             identifying constraints and specifying assumptions. They can select, compare,
                             and evaluate appropriate problem-solving strategies for dealing with complex
                             problems related to these models. Students at this level can work strategically
                             using broad, well-developed thinking and reasoning skills, appropriately linked
                             representations, symbolic and formal characterizations, and insight pertaining to
                             these situations. They can reflect on their actions and formulate and
                             communicate their interpretations and reasoning.
           606
                   Level     At level 4 students can work effectively with explicit models for complex
                             concrete situations that may involve constraints or call for making assumptions.
                             They can select and integrate different representations, including symbolic
                             ones, linking them directly to aspects of real-world situations. Students at this
                             level can utilize well-developed skills and reason flexibly, with some insight, in
                             these contexts. They can construct and communicate explanations and
                             arguments based on their interpretations, arguments and actions.

           544
                   Level     At level 3 students can execute clearly described procedures, including
                             those that require sequential decisions. They can select and apply simple
                             problem-solving strategies. Students at this level can interpret and use
                             representations based on different information sources and reason directly
                             from them. They can develop short communications reporting their
                             interpretations, results and reasoning.
           482
                   Level     At level 2 students can interpret and recognize situations in contexts that
                             require no more than direct inference. They can extract relevant information
                             from a single source and make use of a single reprensentational mode.
                             Students at this level can employ basic algorithms, formulae, procedures, or
                             conventions. They are capable of direct reasoning and making literal
                             interpretations of the results.
           420
                   Level     At level 1 students can answer questions involving familiar contexts where all
                             relevant information is present and the questions are clearly defined. They are
                             able to identify information and to carry out routine procedures according to
                             direct instructions in explicit situations. They can perform actions that are
                             obvious and follow immediately from the given stimuli.
           358




Source: OECD 2004a, figure 1. Reproduced with permission.
                             NATIONAL ASSESSMENTS OF STUDENT ACHIEVEMENT             |   125




    FIGURE B.3.4

    Percentage of Students at Each Proficiency Level on PISA Mathematics Scale

Percentage of students
100

75

50

25

0

25

50

75

100
                Germany
                  Finland
                  Canada
      Hong Kong-China
            Netherlands
           Macao-China
           Liechtenstein
                    Japan
                Australia
             Switzerland
                  Iceland
           New Zealand
                Denmark
                 Belgium
        Czech Republic
                   France

                 Sweden
                   Austria
        Slovak Republic
                  Norway


                   Poland
                     Spain
                 Hungary
                     Latvia
          United States
                 Portugal
            Russian Fed.
                      Italy

                    Serbia
                 Uruguay
                Thailand
                  Mexico
                     Brazil
               Indonesia
                     Korea




                   Ireland




           Luxembourg




                  Greece

                   Turkey


                  Tunisia
Countries are ranked in descending order of percentage of 15-year-olds in Levels
2, 3, 4, 5 and 6.

        Below Level 1    Level 1     Level 2    Level 3        Level 4   Level 5   Level 6


Source: OECD 2003b, figure 2.16a. Reproduced with permission.




stronger “self-concepts” in mathematics than students in other coun-
tries. In contrast, students in Japan and the Republic of Korea, countries
which had scored higher on the mathematics test, tended to have
relatively weak self-concepts in mathematics. Parental occupation
and parental support for education were strongly related to student
achievement.
   Gender differences in science achievement were seldom apparent.
Similar percentages of males and females recorded particularly high
and low scores. In reading, Finland’s mean score was more than one-
half a proficiency level above the OECD mean. Finland, along with the
126    |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




    FIGURE B.3.5

    Percentage of Students at Each Proficiency Level on PISA Reading Scale


Percentage of students
100

75

50

25

0

25

50

75

100
                 Germany
                   Finland
                   Canada
            Liechtenstein
                 Australia
       Hong Kong-China
            New Zealand
                  Sweden
             Netherlands
                  Belgium
            Macao-China
              Switzerland
                   Norway
                     Japan
                    France
                    Poland
                 Denmark
           United States
                   Iceland
                    Austria
                      Latvia
         Czech Republic
                      Spain
                  Hungary
                  Portugal
                       Italy
         Slovak Republic
                  Uruguay
      Russian Federation
                      Brazil
                 Thailand
                   Mexico
                     Serbia
                Indonesia
                      Korea



                    Ireland




            Luxembourg



                   Greece


                    Turkey



                   Tunisia

 Countries are ranked in descending order of percentage of 15-year-olds in
 Levels 3, 4 and 5.

             Below Level 1     Level 1     Level 2     Level 3   Level 4   Level 5


Source: OECD 2004b, figure 6.2. Reproduced with permission.



Republic of Korea and Canada, also recorded relatively low internal
differences, suggesting greater levels of educational equity than in most
participating countries. Very few students in Indonesia, Tunisia, or
Serbia achieved at Level 3 or higher (see figure B.3.5).
                   APPENDIX C

                   REGIONAL STUDIES



C.1. SOUTHERN AND EASTERN AFRICA CONSORTIUM
FOR MONITORING EDUCATIONAL QUALITY

Framework

The Southern and Eastern Africa Consortium for Monitoring Educa-
tional Quality (SACMEQ) is a voluntary grouping of ministries of
education in southern and eastern Africa, comprising Botswana, Kenya,
Lesotho, Malawi, Mauritius, Mozambique, Namibia, Seychelles, South
Africa, Swaziland, Tanzania (mainland), Tanzania (Zanzibar), Uganda,
Zambia, and Zimbabwe. Launched in 1995 with the assistance of the
International Institute for Educational Planning (IIEP) of the United
Nations Educational, Scientific, and Cultural Organization (UNESCO),
SACMEQ was designed (a) to develop institutional capacity through
joint training (“learning by doing” for education planners) and coop-
erative education policy research on schooling and quality of educa-
tion (for example, identifying weaknesses in education systems in
terms of inputs and processes) and (b) to monitor changes in
achievement (IIEP 2007). A notable feature of SACMEQ is its sys-
tematic strategy for consulting with senior policy makers in government
to identify issues of concern that might be addressed in empirical


                                                                   127
128   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




studies. It also seeks to promote stakeholder involvement and greater
transparency in decision making. The first round of SACMEQ studies
was carried out between 1995 and 1999.
   Policy concerns for SACMEQ II studies that were carried out
between 2000 and 2003 were clustered under five main themes
(Murimba 2005b; Passos and others 2005):

• Pupil characteristics and their learning environments
• Teacher characteristics and perceptions (for example, on teaching
  and resources)
• School head characteristics and perceptions (for example, on the
  operation of schools and problems encountered)
• Equity in the allocation of human and material resources among
  regions and schools
• Achievements in reading and mathematics of pupils and their
  teachers.

   SACMEQ was based on an earlier (1991) study carried out in
Zimbabwe (Ross and Postlethwaite 1991) and began as a series of na-
tional studies. Nevertheless, it had an international dimension because
studies shared many features (research questions, instruments, target
populations, sampling procedures, and analyses). A separate report is
prepared for each country. Cross-national comparisons were made
for SACMEQ II but not for SACMEQ I.


Instruments

Data were collected on the reading literacy and numeracy levels of
students in a test of achievement. A number of items from the Trends
in International Mathematics and Science Study (TIMSS) were em-
bedded in SACMEQ II tests to provide comparative data. Question-
naires were used to collect data on baseline indicators for educational
inputs, general conditions of schooling, and equity assessments for
human and material resource allocation. Information on home back-
ground conditions was obtained through pupil questionnaires; pupils
were asked to indicate the number of possessions in their homes from
a list that included items such as a daily newspaper, a weekly or a
                                                  REGIONAL STUDIES   |   129




monthly magazine, a radio, a TV set, a telephone, a motorcycle, a
bicycle, piped water, and electricity.
   SACMEQ II tests included items selected from four earlier studies:
the Zimbabwe Indicators of the Quality of Education Study, SAC-
MEQ I, TIMSS, and the International Association for the Evaluation
of Educational Achievement (IEA) Study of Reading Literacy. Using
those items made possible the comparison of student performance in
the studies with performance in SACMEQ II.
   Reports devote considerable space to describing teacher characteris-
tics (for example, qualifications) and conditions in schools (for example,
classroom furniture, supplies, size, and space); how they compare with
ministry benchmarks; and how they vary by school and location.
   SACMEQ II adopted the definition of reading literacy used in the
IEA Study of Reading Literacy (in 1990): “[T]he ability to understand
and use those written language forms required by society and/or valued
by the individual” (Elley 1992, 3). It also based the development of the
test on the three domains identified in the IEA study:

• Narrative prose. Continuous text where the writer’s aim is to tell a
  story, whether fact or fiction
• Expository prose. Continuous text designed to describe, explain, or
  otherwise convey factual information or opinion.
• Documents. Structured information displays presented in the form of
  charts, tables, maps, graphs, lists, or sets of instruction.

  A table of specifications was constructed in which the three domains
were crossed with seven levels of reading skill:

•   Verbatim recall
•   Paraphrase concept
•   Find main idea
•   Infer from text
•   Locate information
•   Locate and process
•   Apply rules.

  Mathematics literacy in SACMEQ II was defined as “the capacity to
understand and apply mathematical procedures and make related
130   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




judgments as an individual and as a member of the wider society”
(Shabalala 2005, 76). The test assessed competency in three domains:

• Number. Operations and number line, square roots, rounding and
  place value, significant figures, fractions, percentages, ratio
• Measurement. Related to distance, length, area, capacity, money,
  time
• Space-data. Geometric shapes, charts, tables of data.

   The table of specifications matched those three domains with five
“proposed” (or expected) skill levels, ranging from, for example, the
ability to undertake simple single operations using up to two-digit
numbers (Level 1) to the ability to make computations involving
several steps and a mixture of operations using fractions, decimals,
and whole numbers (Level 5).
   Most test items were in multiple-choice format.
   Results were presented in three forms: (a) mean scores, (b) percent-
ages of pupils reaching minimum and desirable levels of achievement,
and (c) percentages of pupils reaching eight competence levels on the
basis of an item response theory model scaling technique (Rasch).
   Mean scores are average measures of performance and may be used
to describe the performance of different categories of pupils (for ex-
ample, boys and girls, pupils living in different provinces or districts).
   Minimum and desirable levels of achievement were defined by
expert committees (consisting of curriculum specialists, researchers,
and experienced teachers) before the collection of data. Two levels
were identified:

• A minimum level that would indicate a pupil would barely survive
  during the next year of schooling
• A desirable level that would indicate a pupil would be able to cope
  with the next year of schooling.

   Analyses were carried out to identify the variety of levels of skills
displayed by pupils and to provide greater insight into the nature of
pupils’ achievements. Reading skills associated with eight levels
included the following:

• Level 1. Prereading: matches words and pictures involving concrete
  concepts and everyday objects.
                                                 REGIONAL STUDIES   |   131




• Level 2. Emergent reading: matches words and pictures involving
  prepositions and abstract concepts; uses cuing systems to interpret
  phrases by reading forward.
• Level 3. Basic reading: interprets meaning (by matching words and
  phrases completing a sentence) in a short and simple text.
• Level 4. Reading for meaning: reads forward and backward to link
  and interpret information located in various parts of a text.
• Level 5. Interpretive reading: reads forward and backward to
  combine and interpret information from various parts of a text in
  association with (recalled) external information that completes
  and contextualizes meaning.
• Level 6. Inferential reading: reads through longer (narrative,
  expository) texts to combine information from various parts of a
  text to infer the writer’s purpose.
• Level 7. Analytical reading: locates information in longer (narrative,
  expository) texts to combine information to infer the writer’s per-
  sonal beliefs (value systems, prejudices, biases).
• Level 8. Critical reading: locates information in longer (narrative,
  expository) texts to infer and evaluate what the writer has assumed
  about both the topic and characteristics of the reader (for example,
  age, knowledge, personal beliefs, values).

  Mathematics skills associated with eight levels included the following:

• Level 1. Prenumeracy: applies single-step identification or subtrac-
  tion operations; recognizes simple shapes; matches numbers and
  pictures; counts in whole numbers.
• Level 2. Emergent numeracy: applies a two-step addition or
  subtraction operation involving carrying and checking (through
  basic estimation); estimates the length of familiar figures; recog-
  nizes common two-dimensional shapes.
• Level 3. Basic numeracy: translates graphical information into
  fractions; interprets place value of whole numbers up to a thou-
  sand; interprets simple common everyday units of measurement.
• Level 4. Beginning numeracy: uses multiple mathematical opera-
  tions on whole numbers, fractions, decimals, or all of these.
• Level 5. Competent numeracy: solves multiple-operation problems
  involving everyday units of measurement, whole and mixed
  numbers, or all of these.
132   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




• Level 6. Mathematically skilled: solves multiple-operation problems
  involving fractions, ratios, and decimals; translates verbal and
  graphic representation information into symbolic, algebraic, and
  equation form.
• Level 7. Problem solving: extracts information from tables, charts,
  and visual and symbolic representations to identify and solve
  multistep problems.
• Level 8. Abstract problem solving: identifies the nature of an
  unstated mathematical problem embedded in verbal or graphic
  information, and translates it into algebraic or equation form to
  solve the problem.


Participants

Between 1995 and 1999, seven education ministries collected infor-
mation in SACMEQ I on students’ reading literacy in grade 6. Four-
teen ministries completed SACMEQ II studies between 2000 and
2002 in a study of students’ reading literacy and numeracy in grade 6.
Conditions varied greatly from country to country. For example, gross
national income was nearly 40 times more in the Seychelles (US$6,730)
than in Malawi (US$170). Government expenditure on education
varied between 30 percent in Swaziland and 7 percent in Tanzania,
while the percentage of an age group enrolled in primary school ranged
from about 40 percent in Mozambique to just over 90 percent in
Mauritius, the Seychelles, and South Africa (Murimba 2005b).
   Teachers, as well as pupils, took the achievement tests in a number
of countries.


Some Findings

Considerable differences in achievement existed between countries
(figure C.1.1). Only 1 percent of sixth graders in Malawi achieved the
“desirable” level in reading, whereas in Zimbabwe the figure was 37
percent. Almost 4 in 10 pupils in participating countries in SACMEQ
II reached the “minimum” level of mastery in reading (set by each
country before the test was administered), but only 1 in 10 reached the
“desirable” level.
                                                                 REGIONAL STUDIES    |    133




 FIGURE C.1.1

 Percentage of Grade 6 Students Reaching Proficiency
 Levels in SACMEQ Reading, 1995–98

          Kenya                                23
                                                                                         65
     Zimbabwe                                              37
                                                                               56
      Mauritius                                     27
                                                                          53
        Zanzibar           5
(U. R. Tanzania)                                                   46
       Namibia                 8
                                                   26
        Zambia         2
                                                   26
         Malawi 1                             22

                   0           10        20         30      40       50         60            70
                               students reaching proficiency levels in reading (%)

                                               desirable   minimum



Source: UNESCO 2004, figure 3.1. Reproduced with permission.




   Comparisons of the reading literacy scores of urban and rural
students revealed large differences in favor of urban students in four
countries (Kenya, Namibia, Tanzania, and Zambia), while in Mauri-
tius and the Seychelles the difference was not statistically significant.
The likely causes of urban-rural differences were complex. Compared
to urban students, students in rural areas had lower levels of family
socioeconomic status, were older, were more likely to have repeated
a grade, and received less home support for their schoolwork.
Furthermore, rural schools in general had fewer and lower-quality
resources than urban schools, which was reflected in how teachers
assigned and corrected student homework, how frequently they met
with students’ parents, and how much support was provided by
inspectors (Zhang 2006).
   An interesting feature of SACMEQ was the use of results to compare
resource provision and trends in reading achievement over a time period
that was marked by a rapid increase in school enrollment in the region.
All six education systems that participated in SACMEQ I (1995) and
134                           |    ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




SACMEQ II (2000) registered an overall increase in resource provision
in schools between the two assessments (Murimba 2005a). In five of the
six countries, however, national mean literacy scores declined (figure
C.1.2); those differences were statistically significant only in Malawi,
Namibia, and Zambia. Overall, achievement scores declined on average
4 percent in the six countries.
   Each national report produced a series of recommendations for
policy makers. For example, the Tanzanian report recommended that
the government investigate gender disparities in school enrollment
and identify options to help eliminate the gender gap (Mrutu, Ponera,
and Nkumbi 2005). This action would include providing care to
orphaned children to relieve girls of heavy household responsibilities
so that they could attend school.
   A number of countries also assessed teacher subject mastery using
the test that was administered to students. In Tanzania, fewer than
half the teachers reached the highest level (Level 8) in reading
(46.1 percent) or in mathematics (43.9 percent).
   SACMEQ results have featured in presidential and national
commissions (in Zimbabwe and Namibia), in prime ministerial and
cabinet reviews of education policy (in Zanzibar), in national education


FIGURE C.1.2

Changes in Literacy Scores between SACMEQ I and SACMEQ II

                                  560
                                                                                        Kenya
                                  540
                                                                                        Mauritius
   mean scores in frequency




                                  520

                                  500

                                  480                                                    Average
                                                                                        Zanzibar
                                                                                        (U. R. Tanzania)
                                  460
                                                                                        Namibia
                                  440                                                   Zambia
                                                                                        Malawi
                                  420
                                          SACMEQ I                                 SACMEQ II
                                           1995 – 96                                2000 – 01


Source: UNESCO 2004, figure 2.4. Reproduced with permission.
                                                  REGIONAL STUDIES   |   135




sector studies (in Zambia), and in reviews of a national education master
plan (in Mauritius).
   In several countries, results were interpreted as indicating a need to
provide standards for resources in education. For example, benchmarks
for the provision of classroom facilities (such as desks per pupil and
books per pupil) were introduced in Kenya. In Zimbabwe, special
funds were provided for classroom supplies.
   High dropout and low completion rates prompted the Ministry of
Education in Kenya to strengthen its nonformal education sector to
cater for those who do not fit into the formal system. Also in Kenya,
SACMEQ findings on gender, regional disparities, and internal
inefficiencies were used to guide the development of action plans to
implement Education for All at national, provincial, and district levels
(Murimba 2005a).



C.2. PROGRAMME D’ANALYSE DES SYSTÈMES ÉDUCATIFS
DE LA CONFEMEN

Framework

The Programme d’Analyse des Systèmes Éducatifs de la CONFEMEN
(Programme on the Analysis of Education Systems, or PASEC) is
conducted under the auspices of the Conférence des Ministres de
l’Éducation des Pays ayant le Français en Partage (Conference of Edu-
cation Ministers of Francophone Countries across the World, or
CONFEMEN). It was launched in 1991 at a conference of franco-
phone education ministers in Djibouti, where the first study was car-
ried out in 1992.
   PASEC has as its primary objective to inform decision making in
education and, more specifically, to address important national policy
issues. It does so by assessing student achievement and by attempting
to identify key factors associated with it, and their associated costs, in
order to establish a hierarchy of potential educational interventions in
terms of their efficiency.
   Five features of PASEC are worth noting. First, it has an interna-
tional dimension in which proposals for country studies are considered
at a meeting of CONFEMEN member countries. If a proposal is
136   |   ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




approved, the national CONFEMEN representative becomes respon-
sible for the establishment of an interdisciplinary group of experts
within the ministry of education which, in turn, will become respon-
sible for implementation (design of questionnaires, administration,
data entry and analysis, preparation of report). PASEC, however, is not
designed primarily to compare student achievement across countries.
   Second, students are tested at the beginning and end of the
academic year. This system means that in analyses, student entry
characteristics can be taken into account to obtain a measure of
student growth throughout the year.
   Third, studies in four countries (Guinea, Mali, Niger, and Togo)
were designed with a particular theme in mind. For example, Guinea
and Togo took as their theme teacher employment policies (includ-
ing teacher training) that had been introduced in Togo in 1983 and in
Guinea in 1998 to reduce the cost of hiring more teachers while
recognizing that those policies might affect the quality of education.
   Fourth, beginning in 1995, the same instruments were used in five
countries (Burkina Faso, Cameroon, Côte d’Ivoire, Senegal [1995/96],
and Madagascar [1997/98]), allowing international comparisons to
be made.
   Fifth, in two countries (Côte d’Ivoire and Senegal), representative
panels of students identified in grade 2 in 1995 were followed through
to grade 6 in 2000 in longitudinal studies.


Instrumentation

Tests (with multiple choice and constructed responses) were con-
structed in French and mathematics on the basis of elements that
were common to curricula in francophone countries in Africa. Tests
were designed for administration at the beginning and end of grades
2 and 5. The end-of-year tests contained some items from the
beginning-of-year tests in addition to items based on material covered
during the course of the year.
   At grade 2, the French tests assessed pupils’ reading vocabulary,
comprehension of sentences and texts, and writing. Grade 5 tests, in
addition to assessing comprehension, assessed spelling and aspects
of grammar.
                                                    REGIONAL STUDIES   |   137




   The mathematics tests at grade 5 included items that assessed
pupils’ knowledge of the properties of numbers and their ability to
carry out basic computations (addition and subtraction). Tests also
included items that required pupils to use addition, subtraction, mul-
tiplication, and division in the solution of problems, as well as items
that assessed pupils’ knowledge of decimals and fractions and of basic
geometric concepts.
   In Mauritius, a test of Arabic and, in Madagascar, a test of Malagasy
were also administered. In Cameroon, an English translation of the
French test was administered to anglophone students.
   Background data were collected in questionnaires administered to
pupils on their personal (gender, age, nutrition, and language spoken)
and their background (parents’ education, availability of books in the
home, and distance to school) factors and in questionnaires adminis-
tered to teachers on their personal characteristics (gender, age, and
education or training) and on their classroom environments.
   In analyses, background factors were related to student achievement
in an attempt to identify relationships between the two sets of vari-
ables. Particular attention was paid to “growth” or the “value added”
during the course of a year and to the contribution of in-school factors,
such as level of teacher training, class size, and textbook availability, as
well as nonschool factors, such as parental education, distance to
school, and home language (Bernard 1999; CONFEMEN 1999; Kulpoo
and Coustère 1999).


Participants

To date, 18 countries have participated in PASEC activities: Benin,
Burkina Faso, Cameroon, the Central African Republic, Chad, Côte
d’Ivoire, the Democratic Republic of Congo, Djibouti, Gabon,
Guinea, Madagascar, Mali, Mauritania, Mauritius, Niger, the Republic
of Congo, Senegal, and Togo.


Some Findings

Results suggest low levels of achievement as reflected in reading and
mathematics test scores (figure C.2.1). “Low achievement” was
138                   |    ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION




FIGURE C.2.1

Percentage of Grade 5 Pupils with Low Achievement, PASEC, 1996–2001

                      50
                              41 43
grade 5 pupils with
 low achievement




                      40                 36
                                              33         32
                      30                                                                27
                                                    17          19 22            21
                      20                                                    16               14
                      10
                      0
                             Senegal   Madagascar   Burking     Togo         Côte      Cameroon
                                                     Faso                   d’lvoire
                                                    French    mathematics


Source: UNESCO 2004, figure 3.32. Reproduced with permission.
Note: The assessment was carried out in Burkina Faso, Cameroon, Côte d’Ivoire, and Senegal in
1995/96; in Madagascar in 1997/98; and in Togo in 2000/01. Countries are ranked by proportion of
low-achieving pupils in mathematics. Low achievement is defined as a score below the 25th percentile
on reading and mathematics.


defined as a score below the 25th percentile on tests of reading and
mathematics.
   Several analyses of PASEC data have been carried out. In one of
those, data from five countries (Burkina Faso, Cameroon, Côte
d’Ivoire, Madagascar, and Senegal) were used in a hierarchical linear
model to assess individual, school level, and national characteristics
determining fifth-grade students’ achievements in French and math-
ematics (Michaelowa 2001). The following were among the findings
that emerged.
   First, a variety of individual student and family characteristics
(including parents’ literacy and the use of French in the student’s
home) were related to student achievement. Second, although students
might appear to benefit from grade repetition, gains were only tempo-
rary. Third, both teachers’ initial education and regular in-service
training appear important in determining student achievement.
Fourth, the number of days teachers were absent from school negatively
affected students’ achievements. Fifth, even though they were paid
less, “voluntary” teachers (employed by pupils’ parents) were more
effective than teachers who were civil servants.
   Sixth, teacher union membership was significantly and negatively
related to student achievement. Seventh, the availability of student
                                                  REGIONAL STUDIES   |   139




textbooks had a strong positive effect on learning achievement.
Eighth, class size (up to 62 students) was positively related to achieve-
ment. Ninth, learning in a multigrade classroom had a positive effect
on achievement. Tenth, students in schools visited during the year by
an inspector performed better than students in schools that did not
have a visit. Finally, girls’ achievement seemed to benefit from being
taught by a female; boys’ achievement seemed to benefit from being
taught by a male.



C.3. LABORATORIO LATINOAMERICANO DE EVALUACIÓN
DE LA CALIDAD DE LA EDUCACIÓN

Framework

The First International Comparative Study of Language and
Mathematics in Latin America was carried out by the Laboratorio
Latinoamericano de Evaluación de la Calidad de la Educación
(Latin American Laboratory for Assessment of the Quality of
Education, or LLECE). This network of national systems of educa-
tion in Latin America and the Caribbean was created in 1994 and
is coordinated by the UNESCO Regional Office for Latin America
and the Caribbean.
   The main aim of the study was to provide information on students’
achievements and associated factors that would be useful in the for-
mulation and execution of education policies within countries. It
would do so by assessing the achievements of primary-school popula-
tions to address the following questions: What do students learn? At
what levels does learning occur? What skills have students developed?
When does the learning occur? Under what conditions does learning
occur? (Casassus and others 1998).
   A comparative framework was considered one of the best ways to
increase understanding of the state of education within countries.
The need for an international study in Latin America was indicated
by the fact that few countries in the region had participated in such a
study and, when they had, the studies had not taken account of
curriculum features specific to the region.
140   |   APPENDIX C




Instruments

Achievement tests (two forms) in language and in mathematics—in
which the curriculum content of each participating country was repre-
sented—were developed. Tests were multiple choice and open ended
(in language only).
   Language components included reading comprehension; metalin-
guistic practice; and production of text in Spanish, except in Brazil
where students were tested in Portuguese.
   Mathematics components included numbers, operations using
natural numbers, common fractions, geometry, and measurement.
   Extensive information was collected in questionnaires (completed
by students, teachers, principals, and parents or guardians) on factors
that were considered likely to be associated with student achievement
(for example, school location and type, educational level of parents or
guardians, and teachers’ and students’ perceptions of the availability
of learning resources in the school).


Participants

In 1997, 13 countries participated in a survey: Argentina, Bolivia, Bra-
zil, Chile, Colombia, Costa Rica, Cuba, the Dominican Republic,
Honduras, Mexico, Paraguay, Peru, and República Bolivariana de Ven-
ezuela. Data for 11 countries are included in the first report of the
survey.
   In each country, samples of approximately 4,000 students in grade
3 (8- and 9-year-olds) and in grade 4 (9- and 10-year-olds) were
assessed. The “oldest 20 percent of the total population” was excluded
(Casassus and others 1998, 18).


Some Findings

Results, classified by type of school attended (public or private) and
location (cities with population over 1 million, urban, and rural),
indicate that Cuban students’ achievement levels, regardless of
school location, are far ahead of those in other countries (tables
C.3.1 and C.3.2). More than 90 percent of Cuban students achieved
TABLE C.3.1
Percentage of Students Who Reached Each Performance Level in Language, by Type of School and Location, LLECE 1997
                            Public                   Private                   Megacity                   Urban                   Rural
Country             Level   Level    Level   Level   Level     Level   Level    Level     Level   Level   Level   Level   Level   Level   Level
                      I       II      III      I       II       III      I        II       III      I       II     III      I       II     III

Argentina            95         77    57      99      93        78      96       85        72      96      79      59      88      62      42
Bolivia              87         55    30      91      70        46      90       66        39      87      58      35      77      40      24
Brazil               95         80    54      98      93        72      96       88        62      95      82      58      84      62      38
Chile                93         71    49      97      86        67      94       76        53      95      79      60      89      63      41
Colombia             89         59    35      97      81        56      96       79        53      89      60      36      89      57      33
Cuba                 100        98    92     n.a.    n.a.      n.a.    100       99        93     100      98      92     100      98      92
Dominican Rep.       77         52    30      83      64        42      84       65        42      73      44      25      73      39      20
Honduras             87         55    29      94      73        44      92       67        38      87      55      29      78      35      17
Mexico               89         58    38      96      84        65      94       70        50      89      64      43      82      48      30
Paraguay             88         60    37      93      75        54     n.a.      n.a.     n.a.     90      67      44      81      51      32




                                                                                                                                                  REGIONAL STUDIES
Peru                 86         55    29      94      78        54      92       70        43      85      57      34      71      30      13
Venezuela,           88         59    38      91      70        49      91       68        48      88      60      38      84      58      39
R.B.de
Source: UNESCO 2001, table 8.
Note: n.a. = not applicable.




                                                                                                                                                  |
                                                                                                                                                  141
                                                                                                                                                            142
TABLE C.3.2
Percentage of Students Who Reached Each Performance Level in Mathematics, by Type of School and Location, LLECE 1997




                                                                                                                                                           |
                                     Public                   Private                   Megacity                   Urban                   Rural




                                                                                                                                                           ASSESSING NATIONAL ACHIEVEMENT LEVELS IN EDUCATION
Country                     Level    Level    Level   Level   Level     Level   Level    Level     Level   Level   Level   Level   Level   Level   Level
                              I        II      III      I       II       III      I        II       III      I       II     III      I       II     III
Argentina                       96    54       12      98      71        23      98       70        26      96      54      11      94      43      6
Bolivia                         93    43       9       96      59        18      95       49        12      94      51      14      89      36      8
Brazil                          93    52       12      97      67        26      96       58        17      94      55      15      84      40      7
Chile                           92    46       7       97      57        15      94       49        10      95      52      12      87      38      6
Colombia                        93    42       5       97      55        10      97       53        8       93      43      6       92      50      12
Cuba                        100       92       79     n.a.    n.a.      n.a.    100       95        82      99      90      76      99      50      72
Dominican Rep.                  82    37       4       86      43        7       86       42        6       81      36      4       79      38      7
Honduras                        84    36       7       93      39        5       87       35        3       86      39      8       78      23      13
Mexico                          94    55       10      98      69        20      97       62        13      94      58      13      90      46      10
Paraguay                        87    29       2       90      49        12     n.a.      n.a.     n.a.     88      42      9       82      34      8
Peru                            87    29       2       94      54        11      88       43        8       89      33      4       78      23      2
Venezuela .R.B.de               76    25       2       76      33        5       75       26        3       77      27      3       68      22      2
Source: UNESCO 2001, table 8.
Note: n.a. = not applicable.
                                                                            REGIONAL STUDIES   |    143




the highest proficiency level (Level III) in language. With one exception
(rural schools), more than 75 percent did so in mathematics. Whereas
72 percent of rural students in Cuba achieved Level III in mathe-
matics, fewer than 10 percent of rural students did so in most of the
remaining countries.
   Further analyses of LLECE data focused on the extent to which the
relationship between socioeconomic status (based on parental level of
schooling and achievement) varied across countries (see figure C.3.1).
The data indicate that socioeconomic gradients vary considerably
among countries; the relationship is more pronounced in Argentina
and Brazil than in Cuba, which had relatively little variation in level of
parental education. Although students in private schools outper-
formed students in public schools, differences between the groups
were not significant when student socioeconomic status was taken
into account (Summit of Americas 2003).


 FIGURE C.3.1

 Socioeconomic Gradients for 11 Latin American Countries, LLECE


                    350
                                                                    Cuba


                    325


                    300
   language score




                                                               Chile         Argentina

                    275
                                                                            Paraguay
                                Brazil             Colombia
                    250                                                       Venezuela, R. B. de

                                                                       as
                              Bolivia                             dur         Dominican
                                                              Hon
                    225                                                        Republic
                                Mexico


                    200

                          0       2        4      6     8      10     12      14    15
                                      parents’ education (years of schooling)


Source: Willms and Somers 2005.
144   |   APPENDIX C




   Cuba had the least variation in parents’ educational attainment, as
well as the highest level of student achievement. Further analyses
revealed that, in comparison with other countries, Cuba tended to
have more day care, more home educational activities, smaller classes,
more highly trained teachers, and fewer multigrade or ability-grouped
classes (Willms and Somers 2001). In a follow-up study, LLECE results
were used to identify schools with outstanding results in seven coun-
tries: Argentina, Bolivia, Chile, Colombia, Costa Rica, Cuba, and
República Bolivariana de Venezuela (LLECE 2002).
   Despite this variety of analyses, the Task Force on Education Reform
in Central America (2000, 19) in its report titled Tomorrow Is Too Late
noted that

  in almost every case there is no clear policy dictating how evaluation
  results can and should be used. Tests of academic achievement have
  not yet become a part of the accountability policies that are being
  demanded by various groups. There has been no discussion of the type
  of decisions that might be based on these results, and there is little
  consensus on the intrinsic value of assessing student performance. As a
  result, these programs are especially vulnerable to changes in government
  and even in senior ministry personnel.
                     REFERENCES



Arregui, P., and C. McLauchlan. 2005. “Utilization of Large-Scale Assess-
ment Results in Latin America.” Unpublished document prepared for the
Partnership for Educational Revitalization in the Americas and the World
Bank Institute.

Beaton, A. E., T. N. Postlethwaite, K. N. Ross, D. Spearritt, and R. M. Wolf.
1999. The Benefits and Limitations of International Educational Achievement
Studies. Paris: UNESCO International Institute for Educational Planning.

Benveniste, L. 2000. “Student Assessment as a Political Construction: The
Case of Uruguay.” Education Policy Analysis Archives 8 (32): 1–41.

———. 2002. “The Political Structuration of Assessment: Negotiating State
Power and Legitimacy.” Comparative Education Review 46: 89–118.

Bernard, J.-M. 1999. “Les Enseignants du Primaire dan Cinq Pays du
Programme d’Analyse des Systèmes Educatifs de la CONFEMEN: Le Rôle
du Maître dans le Processus d’Acquisition des Elèves.” Report of the
Working Group on the Teaching Profession, Francophone Section, of the
Association for the Development of Education in Africa (ADEA). Paris:
ADEA.

Bhutan, Board of Examinations, Ministry of Education. 2004. National
Educational Assessment in Bhutan: A Benchmark of Student Achievement in
Literacy and Numeracy at Class 6, 2003. Thimphu, Bhutan: Ministry of
Education.


                                                                          145
146   |   REFERENCES




Braun, H., and A. Kanjee. 2007. “Using Assessment to Improve Education
in Developing Countries.” In Educating All Children: A Global Agenda, ed.
J. E. Cohen, D. E. Bloom, and M. B. Malin, 303–53. Cambridge, MA:
MIT Press.

Campbell, J. R., D. L Kelly, I. V. S. Mullis, M. O. Martin, and M. Sainsbury.
2001. Framework and Specifications for PIRLS Assessment 2001. 2nd ed.
Chestnut Hill, MA: Boston College.

Casassus, J., J. E. Froemel, J. C. Palafox, and S. Cusato. 1998. First Interna-
tional Comparative Study of Language, Mathematics, and Associated Factors
in Third and Fourth Grades. Santiago, Chile: Latin American Laboratory for
Evaluation of the Quality of Education.

Chinapah, V. 1997. Handbook on Monitoring Learning Achievement:
Towards Capacity Building. Paris: United Nations Educational, Scientific,
and Cultural Organization.

Clarke, M. 2005. NAPE Technical Analysis and Recommendations. Kampala:
Uganda National Examinations Board.

CONFEMEN (Conférence des Ministres de l’Éducation des Pays ayant le
Français en Partage). 1999. Les Facteurs de l’Efficacité dans l’Enseignement
Primaire: Les Resultats du Programme PASEC sur Neuf Pays d’Afrique et de
l’Océan Indien. Dakar: CONFEMEN.

Connecticut Department of Education. 2006. “State Releases Connecticut
Mastery Test Results.” News, August 9. http://www.sde.ct.gov/sde/lib/sde/
PDF/PressRoom/2006cmtresults.pdf.

Coulombe, S., J.-F. Tremblay, and S. Marchand. 2004. International Adult
Literacy Survey: Literacy Scores, Human Capital, and Growth across Fourteen
OECD Countries. Ottawa: Statistics Canada.

Crespo, M., J. F. Soares, and A. de Mello e Souza. 2000. “The Brazilian
National Evaluation System of Basic Education: Context, Process, and
Impact.” Studies in Educational Evaluation 26: 105–25.

Delannoy, F. 2000. Education Reforms in Chile 1980–98: A Lesson in
Pragmatism. Washington, DC: World Bank.

Eivers, E., G. Shiel, R. Perkins, and J. Cosgrove. 2005. The 2004 National
Assessment of English Reading. Dublin: Educational Research Centre.

Elley, W. B. 1992. How in the World Do Students Read? IEA Study of
Reading Literacy. The Hague, Netherlands: International Association for the
Evaluation of Educational Achievement.
                                                        REFERENCES   |   147



———, ed. 1994. The IEA Study of Reading Literacy: Achievement and
Instruction in Thirty-Two School Systems. Oxford, U.K.: Pergamon.

———. 2005. “How TIMSS-R Contributed to Education in Eighteen
Developing Countries.” Prospects 35 (2): 199–212.

Ethiopia, National Organisation for Examinations. 2005. Second National
Learning Assessment of Ethiopia. Addis Ababa: National Organisation for
Examinations.

Ferrer, G. 2006. Educational Assessment Systems in Latin America: Current
Practice and Future Challenges. Washington, DC: Partnership for Educa-
tional Revitalization in the Americas.

Ghana, Ministry of Education, Youth, and Sports. 2004. Results from
Ghanaian Junior Secondary 2 Students’ Participation in TIMSS 2003 in
Mathematics and Science. Accra: Ministry of Education, Youth, and Sports.

Greaney, V., and T. Kellaghan. 1996. Monitoring the Learning Outcomes of
Education Systems. Washington, DC: World Bank.

Hanushek, E. A., and D. D. Kimko. 2000. “Schooling, Labor-Force Quality,
and the Growth of Nations.” American Economic Review 90 (5): 1184–208.

Hanushek, E. A., and L. Wössmann. 2007. Education Quality and Economic
Growth. Washington, DC: World Bank.

Himmel, E. 1996. “National Assessment in Chile.” In National Assessments:
Testing the System, ed. P. Murphy, V. Greaney, M. E. Lockheed, and
C. Rojas, 111–28. Washington, DC: World Bank.

———. 1997. “Impacto Social de los Sistemas de Evaluación del
Rendimiento Escolar: El Caso de Chile.” In Evaluación y reforma educativa:
Opciones de política, ed. B. Álvarez H. and M. Ruiz-Casares, 125–57.
Washington, DC: ABEL/PREAL/U.S. Agency for International Develop-
ment.

Horn, R., L. Wolff, and E. Velez. 1992. “Educational Assessment Systems
in Latin America: A Review of Issues and Recent Experience.” Major Project
of Education in Latin America and the Caribbean Bulletin 27: 7–27.

Howie, S. 2000. “TIMSS-R in South Africa: A Developing Country
Perspective.” Paper presented at American Educational Research Associa-
tion annual meeting, New Orleans, April 24–28.

———. 2002. “English Proficiency and Contextual Factors Influencing
Mathematics Achievement of Secondary School Pupils in South Africa.”
PhD thesis, University of Twente, the Netherlands.
148   |   REFERENCES




Howie, S., and C. Hughes. 2000. “South Africa.” In The Impact of TIMSS on
the Teaching and Learning of Mathematics and Science, ed. D. Robitaille,
A. Beaton, and T. Plomp, 139–45. Vancouver, BC: Pacific Educational Press.

Hoxby, C. E. 2002. “The Cost of Accountability.” Working Paper 8855,
National Board of Economic Research, Cambridge, MA.

Husén, T. 1973. “Foreword.” In Science Achievement in Nineteen Countries,
ed. L. C. Comber and J. P. Keeves, 13–24. New York: Wiley.

Husén, T., and T. N. Postlethwaite. 1996. “A Brief History of the Interna-
tional Association for the Evaluation of Educational Achievement (IEA).”
Assessment in Education 3 (2): 129–41.

IEA (International Association for the Evaluation of Educational Achieve-
ment). 2000. Framework and Specifications for PIRLS Assessment 2001.
Chestnut Hill, MA: International Study Center, Boston College.

IIEP (International Institute for Educational Planning). 2007. “Southern and
Eastern Africa Consortium for Monitoring Educational Quality.” IIEP, Paris.
http://www.unesco.org/iiep/eng/networks/sacmeq/sacmeq.htm.

Ilon, L. 1996. “Considerations for Costing National Assessments.” In
National Assessment: Testing the System, ed. P. Murphy, V. Greaney,
M. E. Lockheed, and C. Rojas, 69–88. Washington, DC: World Bank.

India, National Council of Educational Research and Training, Department
of Educational Measurement and Evaluation. 2003. Learning Achievement
of Students at the End of Class V. New Delhi: Department of Educational
Measurement and Evaluation.

Ishino, T. 1995. “Japan.” In Performance Standards in Education: In Search of
Quality, 149–61. Paris: OECD.

Johnson, E. G. 1992. “The Design of the National Assessment of Educa-
tional Progress.” Journal of Educational Measurement 29 (2): 95–110.

Jones, L. V. 2003. “National Assessment in the United States: The Evolu-
tion of a Nation’s Report Card.” In International Handbook of Educational
Evaluation, ed. T. Kellaghan and D. L. Stufflebeam, 883–904. Dordrecht,
Netherlands: Kluwer Academic.

Kanjee, A. 2006. “The State of National Assessments of Learner Achieve-
ment.” Unpublished paper prepared for the Human Sciences Research
Council, Pretoria, South Africa.

Keeves, J. P. 1995. “The Contribution of IEA Research to Australian
Education.” In Reflections on Educational Achievement: Papers in Honour of
                                                          REFERENCES   |   149




T. Neville Postlethwaite, ed. W. Bos and R. H. Lehmann, 137–58. New
York: Waxman.

Kellaghan, T. 1996. “IEA Studies and Educational Policy.” Assessment in
Education 3 (2): 143–60.

———. 1997. “Seguimiento de los resultados educativos nacionales.” In
Evaluación y reforma educativa: Opciones de política, ed. B. Álvarez H. and
M. Ruiz-Casares, 23–65. Washington, DC: ABEL/PREAL/U.S. Agency for
International Development.

———. 2003. “Local, National and International Levels of System Evalua-
tion: Introduction.” In International Handbook of Educational Evaluation, ed.
T. Kellaghan and D. L. Stufflebeam, 873–82. Dordrecht, Netherlands:
Kluwer Academic.

———. 2006. “What Monitoring Mechanisms Can Be Used for Cross-
National (and National) Studies?” In Cross-National Studies of the Quality of
Education: Planning Their Design and Managing Their Impact, ed. K. N. Ross
and I. J. Genevois, 51–55. Paris: International Institute for Educational
Planning.

Kellaghan, T., and V. Greaney. 2001a. “The Globalisation of Assessment in
the 20th Century.” Assessment in Education 8 (1): 87–102.

———. 2001b. Using Assessment to Improve the Quality of Education. Paris:
International Institute for Educational Planning.

———. 2004. Assessing Student Learning in Africa. Washington, DC:
World Bank.

Khaniya, T., and J. H. Williams. 2004. “Necessary but Not Sufficient:
Challenges to (Implicit) Theories of Educational Change—Reform in
Nepal’s Education System.” International Journal of Educational Develop-
ment 24 (3): 315–28.

Kirsch, I. 2001. The International Adult Literacy Study (IALS): Understanding
What Was Measured. Princeton, NJ: Educational Testing Service.

Kulpoo, D., and P. Coustère. 1999. “Developing National Capacities for
Assessment and Monitoring through Effective Partnerships.” In Partnerships
for Capacity Building and Quality Improvements in Education: Papers from
the ADEA 1997 Biennial Meeting, Dakar. Paris: Association for the Devel-
opment of Education in Africa.

Lesotho, Examinations Council of Lesotho and National Curriculum
Development Centre. 2006. Lesotho: National Assessment of Educational
150   |   REFERENCES




Progress, 2004. Maseru: Examinations Council of Lesotho and National
Curriculum Development Centre.

LLECE (Latin American Laboratory for Evaluation of the Quality of
Education). 2002. Qualitative Study of Schools with Outstanding Results in
Seven Latin American Countries. Santiago: LLECE.

Lockheed, M. E., and A. Harris. 2005. “Beneath Education Production
Functions: The Case of Primary Education in Jamaica.” Peabody Journal of
Education 80 (1): 6–28.

Makuwa, D. 2005. The SACMEQ II Project in Namibia: A Study of the
Conditions of Schooling and Quality of Education. Harare: Southern and
Eastern Africa Consortium for Monitoring Educational Quality.

McMeekin, R. W. 2000. Implementing School-Based Merit Awards: Chile’s
Experiences. Washington, DC: World Bank.

Michaelowa, K. 2001. “Primary Education Quality in Francophone Sub-
Saharan Africa: Determinants of Learning Achievement and Efficiency
Considerations.” World Development 29 (10): 1699–716.

Mrutu, A., G. Ponera, and E. Nkumbi. 2005. The SACMEQ II Project in
Tanzania: A Study of the Conditions of Schooling and the Quality of Educa-
tion. Harare: Southern and Eastern Africa Consortium for Monitoring
Educational Quality.

Mullis, I. V. S., A. M. Kennedy, M. O. Martin, and M. Sainsbury. 2006.
PIRLS 2006: Assessment Framework and Specifications. Chestnut Hill, MA:
International Study Center, Boston College.

Mullis, I. V. S., M. O. Martin, E. J. Gonzalez, and S. J. Chrostowski. 2004.
TIMSS 2003 International Mathematics Report: Findings from IEA’s Trends in
International Mathematics and Science Study at the Fourth and Eighth
Grades. Chestnut Hill, MA: International Study Center, Boston College.

Mullis, I. V. S., M. O. Martin, E. J. Gonzalez, and A. M. Kennedy. 2003.
PIRLS 2001 International Report: IEA’s Study of Reading Literacy Achieve-
ment in Primary Schools. Chestnut Hill, MA: International Study Center,
Boston College.

Mullis, I. V. S., M. O. Martin, G. J. Ruddock, C. Y. O’Sullivan, A. Arora,
and E. Erberber. 2005. TIMSS 2007 Assessment Frameworks. Chestnut Hill,
MA: International Study Center, Boston College.

Murimba, S. 2005a. “The Impact of the Southern and Eastern Africa
Consortium for Monitoring Educational Quality (SACMEQ).” Prospects
35 (1): 91–108.
                                                             REFERENCES   |   151



———. 2005b. “The Southern and Eastern Africa Consortium for Monitoring
Educational Quality (SACMEQ): Mission Approach and Projects.” Prospects
35 (1): 75–89.

Nassor, S., and K. A. Mohammed. 1998. The Quality of Education: Some
Policy Suggestions Based on a Survey of Schools—Zanzibar. SACMEQ Policy
Research 4, International Institute for Educational Planning, Paris.

Naumann, J. 2005. “TIMSS, PISA, PIRLS, and Low Educational Achieve-
ment in World Society.” Prospects 35 (2): 229–48.

OECD (Organisation for Economic Co-operation and Development). 2001.
Outcomes of Learning: Results from the 2000 Program for International Student
Assessment of 15-Year-Olds in Reading, Mathematics, and Science Literacy.
Paris: OECD. http://nces.ed.gov/pubs 2002/2002115.pdf.

———. 2003. The PISA 2003 Assessment Framework: Reading, Mathematics,
Science and Problem Solving Knowledge and Skills. Paris: OECD.

———. 2004a. First Results from PISA 2003: Executive Summary. Paris:
OECD. http://www.oecd.org/dataoecd/1/63/34002454.pdf.

———. 2004b. Learning for Tomorrow’s World: First Results from PISA
2003. Paris: OECD.

———. 2007. “Sample Questions: PISA Mathematics with Marking
Guide.” OECD, Paris. http://pisa-sq.acer.edu.au.

OECD (Organisation for Economic Co-operation and Development) and
UNESCO (United Nations Educational, Scientific, and Cultural Organiza-
tion) Institute for Statistics. 2003. Literacy Skills for the World of Tomorrow:
Further Results from PISA 2000. Paris and Montreal: OECD and UNESCO
Institute for Statistics.

Olivares, J. 1996. “Sistema de Medición de la Calidad de la Educación de
Chile: SIMCE, Algunos Problemas de la Medición.” Revista Iberoamericana
de Educación 10. http://www.rieoei.org/oeivirt/rie10a07.htm.

Passos, A., T. Nahara, F. Magaia, and C. Lauchande. 2005. The SACMEQ
II Project in Mozambique: A Study of the Conditions of Schooling and the
Quality of Education. Harare: Southern and Eastern Africa Consortium for
Monitoring Educational Quality.

Perera, L., S. Wijetunge, W. A. de Silva, and A. A. Navaratne. 2004.
Achievement after Four Years of Schooling. National Assessment of Achievement
of Grade Four Pupils in Sri Lanka: National Report. Colombo: National
Education Research and Evaluation Centre, University of Colombo.
152   |   REFERENCES




Postlethwaite, T. N. 2004. “What Do International Assessment Studies
Tell Us about the Quality of School Systems?” Background paper for
Education for All Global Monitoring Report 2005, United Nations Educa-
tional, Scientific, and Cultural Organization, Paris.

Prakash, V., S. K. S. Gautam, and I. K. Bansal. 2000. Student Achievement
under MAS: Appraisal in Phase-II States. New Delhi: National Council of
Educational Research and Training.

Ramirez, F. O., X. Luo, E. Schofer, and J. W. Meyer. 2006. “Student
Achievement and National Economic Growth.” American Journal of
Education 113 (1): 1–29.

Ravela, P. 2005. “A Formative Approach to National Assessments: The
Case of Uruguay.” Prospects 35 (1): 21–43.

Reddy, V. 2005. “Cross-National Achievement Studies: Learning from
South Africa’s Participation in the Trends in International Mathematics
and Science Study.” Compare 35 (1): 63–77.

———. 2006. Mathematics and Science Achievement at South African
Schools in TIMSS 2003. Capetown, South Africa: Human Sciences
Research Council Press.

Robitaille, D. F., A. E. Beaton, and T. Plomp, eds. 2000. The Impact of
TIMSS on the Teaching and Learning of Mathematics and Science. Vancouver,
BC: Pacific Educational Press.

Rojas, C., and J. M. Esquivel. 1998. “Los Sistemas de Medición del Logro
Academico en Latino América.” LCSHD Paper 25, Washington, DC:
World Bank.

Ross, K. 1987. “Sample Design.” International Journal of Educational
Research 11 (1): 57–75.

Ross, K., and T. N. Postlethwaite. 1991. Indicators of the Quality of Educa-
tion: A Study of Zimbabwean Primary Schools. Harare: Ministry of Educa-
tion and Culture; Paris: International Institute for Educational Planning.

Shabalala, J. 2005. The SACMEQ II Project in Swaziland: A Study of the
Conditions of Schooling and the Quality of Education. Harare: Southern and
Eastern Africa Consortium for Monitoring Educational Quality.

Shukla, S., V. P. Garg, V. K. Jain, S. Rajput, and O. P. Arora. 1994.
Attainments of Primary School Children in Various States. New Delhi:
National Council of Educational Research and Training.

Sofroniou, N., and T. Kellaghan. 2004. “The Utility of Third International
Mathematics and Science Study Scales in Predicting Students’ State
                                                          REFERENCES   |   153



Examination Performance.” Journal of Educational Measurement
41 (4): 311–29.

Štraus, M. 2005. “International Comparisons of Student Achievement as
Indicators for Educational Policy in Slovenia.” Prospects 35 (2): 187–98.

Summit of Americas. 2003. Regional Report: Achieving the Educational
Goals. Santiago: Ministry of Education, Chile; Paris: United Nations
Educational, Scientific, and Cultural Organization.

Task Force on Education Reform in Central America. 2000. Tomorrow Is
Too Late. http://thedialogue.org/publications/preal/tomorrow.pdf.

UNEB (Uganda National Examinations Board). 2006. The Achievements of
Primary School Pupils in Uganda in English Literacy and Numeracy.
Kampala: UNEB.

UNESCO (United Nations Educational, Scientific, and Cultural Organiza-
tion). 1990. Final Report of the World Congress on Education for All: Meeting
Basic Learning Needs, Jomtien, Thailand. Paris: UNESCO.

———. 2000. The Dakar Framework for Action—Education for All: Meeting
Our Collective Commitments. Paris: UNESCO.

———. 2001. Technical Report of the First International Comparative Study.
Santiago: Regional Office for Latin America and the Caribbean.

———. 2002. EFA Global Monitoring Report 2002: Is the World on Track?
Paris: UNESCO.

———. 2004. EFA Global Monitoring Report 2005: The Quality Imperative.
Paris: UNESCO.

U.S. National Center for Education Statistics. 2005. National Assessment of
Educational Progress: The Nation’s Report Card, Reading 2005. Washington,
DC: U.S. National Center for Education Statistics.

———. 2006. “NAEP Overview.” U.S. National Center for Education
Statistics, Washington, DC. http://nces.ed.gov/nationsreportcard/about/.

———. n.d. “Comparing NAEP, TIMSS, and PISA in Mathematics and
Science.” U.S. National Center for Education Statistics, Washington, DC.
http://nces.ed.gov/timss/pdf/naep_timss_pisa_comp.pdf.

Wilkins, J. L. M., M. Zembylas, and K. J. Travers. 2002. “Investigating
Correlates of Mathematics and Science Literacy in the Final Year of
Secondary School.” In Secondary Analysis of the TIMSS Data, ed.
D. F. Robitaille and A. E. Beaton, 291–316. Dordrecht, Netherlands:
Kluwer Academic.
154   |   REFERENCES




Willms, J. D., and M.-A. Somers. 2005. “Raising the Learning Bar in Latin
America: Measuring Student Outcomes.” Policy Brief, Canadian Research
Institute for Social Policy, University of New Brunswick, Fredericton.

Winograd, P., and B. Thorstensen. 2004. “Using Large Scale Assessments
to Inform the Policies and Practices That Support Student Learning.”
Working paper developed for the International Reading Association and
the World Bank’s Global National Assessment Training Project, Office of
Education Accountability, Santa Fe, NM.

Wolff, L. 1998. “Educational Assessment in Latin-America: Current
Progress and Future Challenges.” Working Paper 11, Programa de Promo-
ción de la Reforma Educativa en America Latina y el Caribe, Partnership
for Educational Revitalization in the Americas, Washington, DC.

World Bank. 2004. Vietnam: Reading and Mathematics Assessment Study.
Vols. 1–3. Washington, DC: World Bank.

———. 2007. EdStats database. http://www1.worldbank.org/education/
edstats/.

World Declaration on Education for All. 1990. Adopted by the World
Conference on Education for All, Meeting Basic Learning Needs, Jomtien,
Thailand, March 5–9. New York: United Nations Educational, Scientific,
and Cultural Organization. http://www.unesco.org/education/information/
nfsunesco/pdf/JOMTIE_E.PDF.

Zhang, Y. 2006. “Urban-Rural Literacy Gaps in Sub-Saharan Africa: The
Roles of Socioeconomic Status and School Quality.” Comparative Educa-
tion Review 50 (4): 581–602.
                           INDEX



Boxes, figures, and tables are indicated by “b,” “f,” and “t.”


administration of tests and questionnaires,         benchmarks, 66, 72, 73t, 113t
      29–30, 50, 52t, 74                            Benin, 137
affective outcome assessment, 38                    Bhutan, 17–18, 82
Africa                                              Bolivia, 140, 141t, 142t, 143f
   objectives for national assessment in, 18        Botswana, 73t, 92, 113t, 127
   PIRLS in, 116                                    Brazil, 3, 32, 73, 119, 122, 123f, 125f,
   population for assessment in, 31                      126f, 140, 141t, 142t, 143
   primary school enrollment in, 72                 Bulgaria, 113t, 117, 118t
   teacher assessment in, 13                        Burkina Faso, 136, 137, 138
   TIMSS in, 66
   See also Southern and Eastern Africa             Cambodia, 29, 51
      Consortium for Monitoring Educa-              Cameroon, 13, 136, 137, 138, 138t
      tional Quality (SACMEQ)                       Canada, 69, 113t, 118t, 122, 123f, 125f, 126
Analyzing Data from a National Assessment           Caribbean countries. See Latin America
   of Educational Achievement (Volume 4 in                and Caribbean
   series), 5                                       carrying out national assessment, 2,
Arab countries, school attendance in, 72                  25–29, 52t
Argentina, 3, 25, 28–29, 116, 118t, 120,            census and sample-based approaches to
      140, 141t, 142t, 143                                national assessment, 32–34, 78, 81
Armenia, 113t                                       Central African Republic, 137
assessment frameworks, 34–39, 52t                   Chad, 13, 137
Australia, 20, 68, 113t, 122, 123f,                 Chile
      125f, 126f                                       affective outcome assessment in, 38
Austria, 122, 123f, 125f, 126f                         case study of, 81, 82, 99–101
Azerbaijan, 120                                        census-based approaches to national
                                                          assessment in, 34
Bahrain, 73t, 113t                                     communication and use of results in, 48
Bangladesh, 13                                         frequency of assessment in, 43
Basque Country, Spain, 113t                            government commitment to assessment
Belgium, 113t, 123f, 125f, 126f                           process in, 79
Belize, 116, 118t                                      implementation agency in, 81




                                                                                               155
156   |   INDEX




  LLECE in, 140, 141t, 142t, 143f               Cyprus, 113t, 118t
  objectives for national assessment in,        Czech Republic, 69, 118t, 123f,
      20, 21                                        125f, 126f
  PISA in, 119, 120
  reward program in, 20, 21, 99–101             Dakar Framework for Action (2000), 8–9
  TIMSS in, 112                                 decisions in national assessment, 2, 23–52
China, 112, 114, 116, 119, 120                     administration of tests and question-
Chinese Taipei, 73t, 112, 113t, 116, 120              naires, 29–30, 50, 52t
Colombia, 31, 32, 38, 112, 116, 118t, 120,         assessment framework, 34–39, 52t
      140, 141t, 142t, 143f                        communication and use of results,
common errors, 53–60                                  48–49, 52t
CONFEMEN (Conférence des Ministres                 cost components, 49–52
      de l’Education des Pays ayant le             frequency of assessment, 43, 52t, 81
      Français en Partage), 135–36                 how achievement will be assessed,
  See also Programme d’Analyse des                    39–43, 50, 52t
      Systèmes Éducatifs de la CONFEMEN            national steering committees (NSCs),
      (PASEC)                                         23–25, 28, 32, 39, 49, 52t
Congo, Democratic Republic of, 13, 137             policy guidance for, 23–25, 52t, 78–79
content analysis assessment, 35                    population for assessment, 30–34,
contextual factors and student                        52t, 78
      achievement, 38–39                              sample vs. census-based, 32–34,
cost, 43, 68, 75                                      52t, 78, 81
  implementation agencies and, 49                  primary responsibility for, 52
  of national assessment, 49–52                    reporting of student achievement,
  staffing requirements and, 50                        44–46, 52t
  of statistical analyses and procedures, 50       statistical procedures, 46, 48, 50, 52t
  of test administration, 51                       summary of, 45
  TIMSS and, 75                                    what will be assessed, 34–39, 52t
Costa Rica, 32, 140                                who should carry out national
Côte d’Ivoire, 136, 137, 138                          assessment, 2, 25–29, 52t
country case studies, 3, 4, 81–82, 85–107       Denmark, 123f, 125f, 126f
  Chile, 99–101                                 design of a national assessment, 2–3,
  India, 85–87                                        53–60
  Nepal, 97–99                                  Developing Tests and Questionnaires for a
  South Africa, 92–94                                 National Assessment of Educational
  Sri Lanka, 95–97                                    Achievement (Volume 2 in series), 4
  Uganda, 103–7                                 Dirección Nacional de Información y
  United States, 101–3                                Evaluación de la Calidad Educativa
  Uruguay, 90–92                                      (DiNIECE, Argentina), 25, 27
  Vietnam, 87–89                                dissemination and use of findings, 2, 3,
Cuba, 32, 113t, 140, 141t, 142t, 143                  79–80
curriculum                                      Djibouti, 137
  development and assessment, 9, 31,            Dominican Republic, 13, 140, 141t,
      37–38, 42, 49, 51, 56, 78                       142t, 143f
  domains, performance in, 34–35,
      44–45, 58                                 economic status and student achievement,
  frequency of assessment and, 43                    38–39, 58, 66, 70, 71, 83, 91, 96t,
  international studies and, 62, 63, 64t, 67,        100, 133, 143
      69–71                                     Education for All (EFA, 2000), 1, 9, 135
  narrowing of, 54                              Egypt, 92, 113t
  policy makers and, 72, 80                     elements of a national assessment, 12–14
  public examinations and, 14, 15t, 16          England, 43, 69, 113t, 118t
  reporting of assessment data and, 44          Estonia, 113t
  teams and, 28–29                              Ethiopia, 10, 29, 48
  TIMSS and, 114
                                                                             INDEX    |    157



Finland, 122, 123f, 125–26                          case study of, 85–87
Framework and Specifications (IEA), 35–36         Indonesia, 73, 112, 113t, 116, 119, 122,
France, 32, 118t, 123f, 125f, 126f                     125f, 126
frequency of assessment, 43, 52t, 81, 85,        institutionalization of national
     87, 90, 93, 97, 99, 102, 104, 109,                assessment, 79
     115, 119, 132, 136                          Inter-American Bank, 90
                                                 international assessment of student
Gabon, 137                                             achievement, 3, 61–75, 109–26
gender                                              advantages of, 66–70
   country cases and, 85, 86, 89t, 95, 98,          growth in, 63–66
      104, 105                                      problems with, 70–75
   mathematics and science education             International Association for the
      and, 20                                          Evaluation of Educational Achieve-
   PASEC and, 139                                      ment (IEA), 35–36, 63, 74, 83, 129
   PIRLS and, 117                                   See also Progress in International Reading
   PISA and, 122, 125                                  Literacy Study (PIRLS); Trends in
   policy formation support by national                International Mathematics and Science
      assessment and, 82                               Study (TIMSS)
   SACMEQ and, 130, 134, 135, 137                International Institute for Educational
   school attendance and, 72                           Planning (IIEP), 127
   TIMSS and, 114                                   See also Southern and Eastern Africa
Germany, 62, 70, 118t, 123f, 125f, 126f                Consortium for Monitoring Educa-
Ghana, 42, 44, 72, 73t, 92, 113t                       tional Quality (SACMEQ)
globalization and international assessment,      international studies, 3, 4, 82–83, 109–26
      1, 66                                         Programme for International Student
government commitment, importance of,                  Assessment (PISA), 37–38, 40,
      79                                               119–26
Greece, 118t, 125f, 126f                            Progress in International Reading
Guinea, 13, 72, 136, 137                               Literacy Study (PIRLS), 114–18
                                                    Trends in International Mathematics and
Honduras, 32, 140, 141t, 142t, 143f                    Science Study (TIMSS), 34, 40, 83,
Hong Kong, 112, 113t, 114, 116, 118t,                  109–14
    119, 125f, 126f                              Iran, Islamic Republic of, 112, 113t,
Hungary, 113t, 118t, 123f, 125f, 126f                  116, 118t
                                                 Ireland, 40, 70, 123f, 125f, 126f
Iceland, 69, 118t, 123f, 125f, 126f              Israel, 112, 113t, 116, 118t, 119
IEA. See International Association for the       Italy, 113t, 118t, 123f, 125f, 126f
      Evaluation of Education Achievement        item-level information, reporting of, 39,
implementation, 3, 53–60                               44, 86, 91
   options for, 25, 26–27t
implementation agencies                          Jamaica, 32
   administration of tests and questionnaires    Japan, 43, 68, 112, 113t, 119, 122, 123f,
      by, 30                                          125, 126f
   carrying out national assessment and,         Jomtien Declaration (1990), 8
      25, 26–27t, 29                             Jordan, 32, 120
   communication and use of results and, 49
   costs and, 49                                 Kazakhstan, 13
   decisions, summary of, 52t                    Kenya, 46, 127, 133, 134t, 135
   responsibility for deciding how               Korea, Republic of, 69, 73t, 112, 113t,
      achievement will be assessed, 43               114, 119, 122, 123f, 125, 126
   what will be assessed, determination of, 34   Kuwait, 69, 112, 116, 118t
Implementing a National Assessment of            Kyrgyzstan, 120
      Educational Achievement (Volume 3
      in series), 4–5, 29–30, 51                 Laboratorio Latinoamericano de
India, 13, 72, 82                                    Evaluación de la Calidad de la
158   |   INDEX




      Educación (LLECE)                          as implementation agencies, 81
   findings, 140–44                               what will be assessed, determination
   framework, 139                                    of, 39
   instruments, 140                             Moldova, 113t, 118t
   participants, 140                            Monitoring Learning Achievement,
language of assessment, 9, 32, 42–43, 46,            38, 92, 95
      71, 93, 94, 96, 97t, 105, 117             Morocco, 73t, 92, 112, 113t, 116, 118t
Lao People’s Democratic Republic, 13            Mozambique, 72, 127, 132
Latin America and Caribbean                     multiple-choice format, 40, 41b
   comparative data and, 74
   institutes in, 79                            Namibia, 18, 127, 133, 134
   national assessment in, 8                    National Administration for Public
   PISA in, 119, 120                                 Education, 90
   school attendance in, 72                     National Assessment of Educational
   surrogate public examinations in, 34              Progress (NAEP, U.S.), 18–19, 20f,
   See also Laboratorio Latinoamericano de           40, 51, 62, 101–3
      Evaluación de la Calidad de la            National Assessment of English Reading
      Educación (LLECE)                              (Ireland), 40
Latvia, 69, 113t, 114, 118t, 123f,              national examination board, 81
      125f, 126f                                national steering committees (NSCs),
league tables, 21, 33, 67, 122                       23–25, 28, 32, 39, 49, 52t
   See also ranking of test scores              Nepal, 19, 82
Lebanon, 113t                                     case study of, 97–99
Lesotho, 45, 49, 127                            Netherlands, 113t, 117, 118t, 122,
Liechtenstein, 123f , 125f, 126f                     125f, 126f
life skills assessment, 38                      New Zealand, 68, 113t, 118t, 123f,
literacy, assessment framework for, 35–37            125f, 126f
   See also Progress in International Reading   Niger, 136, 137
      Literacy Study (PIRLS)                    Nigeria, 32
Lithuania, 113t, 114, 118t                      North America, primary school
Luxembourg, 123f, 125f, 126f                         enrollment in, 72
                                                Norway, 69, 113t, 118t, 122, 123f,
Macao, 120, 125f, 126f                               125f, 126f
Macedonia, 113t, 118t
Madagascar, 137, 138                            objectives for national assessment,
Malawi, 18, 127, 132, 134                            2, 17–21
Malaysia, 112, 113t                             open-ended questions in assessment,
Maldives, 29                                         40, 41b
Mali, 136, 137                                  Organisation for Economic Co-Operation
mastery standards, 45–46, 47t                        and Development (OECD), 63, 119
Mauritania, 13, 137                               See also Programme for International
Mauritius, 127, 132, 133, 134t, 135, 137             Student Assessment (PISA)
media and international assessment, 67          outcomes of education, emphasis on, 1–2
MESyFOD (Programa de Modernización
     de la Educación Secundaria y               Pakistan, 13
     Formación Docente), 90                     Palestinian National Authority, 113t
Mexico, 32, 34, 119, 123f, 125f, 126f, 140,     Paraguay, 140, 141t, 142t, 143f
     141t, 142t, 143f                           PASEC. See Programme d’Analyse
Millennium Development Goals                          des Systèmes Éducatifs de la
     (MDGs), 21                                       CONFEMEN
ministries of education (MOE)                   Peru, 13, 119, 120, 140, 141t, 142t
  carrying out national assessment by, 2,       Philippines, 68, 73t, 112, 113t
     25, 26t, 29                                pilot testing, 5, 12, 28, 51, 55, 56, 111
  communication and use of results and, 49         See also pretesting
  costs of national assessment and, 51, 52      PIRLS. See Progress in International
                                                                        INDEX   |   159



      Reading Literacy Study                    Laboratorio Latinoamericano de
PISA. See Programme for International              Evaluación de la Calidad de la
      Student Assessment                           Educación (LLECE), 139–43
Poland, 122, 123f, 125f, 126f                   Programme d’Analyse des Systèmes
policy guidance for national assessment,           Éducatifs de la CONFEMEN
      23–25, 52t, 78–79, 82                        (PASEC), 135–39
   See also Appendices A and C                  Southern and Eastern Africa Consortium
population for assessment                          for Monitoring Educational Quality
   census-based vs. sample-based, 32–34,           (SACMEQ), 127–35
      52t, 78, 81                            reliability, 13, 43, 50, 80, 98
   decisions in national assessment and,     Reporting and Using Results from a National
      30–32, 52t                                   Assessment of Educational Achievement
   in international assessment, 71–72              (Volume 5 in series), 5, 44, 49
   subgroups, 2, 9                           reporting of student achievement
Portugal, 125f, 126f                            item-level information, 44
pretesting, 4, 28                               mastery standards, 45–46, 47t
   See also pilot testing                       performance in curriculum domains, 44–45
primary school enrollment, 72                   performance standards, 45
Programa de Modernización de la                 report writing, 59–60
      Educación Secundaria y Formación       research institutes, 25, 26t, 29, 81
      Docente (MESyFOD), 90                  responsibility for national assessment,
Programme d’Analyse des Systèmes                   26–27t, 52t
      Éducatifs de la CONFEMEN                  See also Appendix A
      (PASEC), 135–39                        reward programs for performance, 20, 21,
   findings, 137–39                                 99–101
   framework, 135–36                         Romania, 113t, 118t
   instrumentation, 136–37                   Russian Federation, 62, 68, 113t, 118t,
   participants, 137                               123f, 125f, 126f
   population for assessment and, 31
Programme for International Student          SACMEQ. See Southern and Eastern Africa
      Assessment (PISA), 119–26                    Consortium for Monitoring
   assessment framework of, 37–38                  Educational Quality
   findings, 66, 83, 122–26                   sample or population for assessment,
   framework, 83, 119                              32–34, 52t, 78, 81
   instruments, 40, 120–22                   sanctions and performance, 21, 33–34
   participants, 63, 90, 119–20              Sarva Shiksha Abhiyan (SSA) program
   TIMSS compared, 63, 64–65t                      (India), 85–86
   use of data from, 62, 68, 69–70           Saudi Arabia, 73t, 112, 113t
Progress in International Reading Literacy   Scotland, 69, 113t, 118t
      Study (PIRLS), 114–18                  Senegal, 136, 137, 138
   findings, 117–18                           Serbia, 113t, 125f, 126
   framework, 114–15                         Seychelles, 127, 132, 133
   frequency of, 43                          Sierra Leone, 24
   instruments, 35–37, 63, 116               Singapore, 69, 73t, 112, 114, 116, 118t
   participants, 116                         Sistema de Medición de la Calidad de la
public examination vs. national assessment         Educación (SIMCE), 79, 99–101
      of student achievement, 7–8, 14–16     Slovak Republic, 69, 113t, 118t, 125f, 126f
purposes of national assessment. See         Slovenia, 113t, 118t
      objectives for national assessment     SNED (National System of Teacher
                                                   Performance Assessment in Publicly
Qatar, 116, 120                                    Supported Schools, Chile), 99–101
                                             South Africa, 42, 70, 73t, 75b, 81, 82,
ranking of test scores, 58, 67, 73–74              112, 113t, 116, 127, 132
   See also league tables                       case study of, 92–94
regional studies, 3, 4, 83, 127–44           Southern and Eastern Africa Consortium
160   |   INDEX




      for Monitoring Educational Quality            of users, 80
      (SACMEQ), 127–35                           tests and questionnaires, administration of,
   assessment of teacher achievement, 39               29–30, 50, 52t, 74
   comparative data and, 74                      Thailand, 112, 119, 125f, 126f
   findings, 132–35                               Togo, 136, 137
   framework, 127–28                             Trends in International Mathematics
   instruments, 45, 128–32                             and Science Study (TIMSS),
   participants, 63, 92, 132                           109–14
   population assessed and, 31                      cost of, 75
   use of data from, 68                             findings, 83, 112–14
South Korea. See Korea, Republic of                 framework, 83, 109–10
Spain, 69, 123f, 125f, 126f                         participants, 63, 66, 92–93, 112
Sri Lanka, 31, 45–46, 48, 82                        PISA compared, 63, 64–65t
   case study of, 95–97                             SACMEQ and, 128–29
SSA (Sarva Shiksha Abhiyan program,                 student motivation and, 34
      India), 85–86                                 use of data from, 62, 68–69
statistical analyses and procedures              Trinidad and Tobago, 116
   common errors, 58–59                          Tunisia, 73, 92, 112, 113t, 120, 122,
   costs of, 50                                        125f, 126
   decisions in national assessment and, 46,     Turkey, 116, 118t, 120, 125f, 126f
      48, 52t
   issues in, 57–59                              Uganda, 39, 42, 82, 127
   recommended activities, 57–58                   case study of, 103–7
subnational assessment, 3                        UNESCO (United Nations Educational,
Sub-Saharan Africa, primary school                    Scientific, and Cultural
      enrollment in, 72                               Organization), 127
Swaziland, 127, 132                              Unidad de Medición de Resultados
Sweden, 69, 113t, 117, 118t, 123f,                    Educativos (UMRE), 90–92
      125f, 126f                                 United Arab Emirates, 13
Switzerland, 68, 70, 123f, 125f, 126f            United Kingdom, 123f
Syrian Arab Republic, 112                        United States
                                                   case study of, 101–3
Tanzania, 127, 132, 133, 134                       NAEP. See National Assessment of
teacher achievement, assessment of, 13,               Educational Progress (NAEP, U.S.)
      39, 86, 88, 132                              objectives for national assessment in,
teacher training, 1, 6, 7, 14, 18, 19, 24, 37,        18–19, 101–2
      49, 59, 67, 68, 95, 98, 102, 114, 136,       PIRLS, 118t
      137, 138                                     PISA, 122, 123f, 125, 126f
   in-service, 50, 91, 138                         population for assessment in, 30, 31
teams to carry out national assessment,            subnational assessment in, 3
      28–29                                        surrogate public examinations in, 34
technical capacity                                 TIMSS, 69, 113t, 114
   carrying out national assessment and, 25,     universities and national assessment, 8,
      26t, 27t                                        12b, 25, 26t, 29, 68
   communication and use of national             Uruguay
      assessment and, 48                           case study of, 81, 82, 90–92
   conditions for national assessment              census and sample-based approaches to
      and, 78                                         national assessment in, 31, 32
   donors and, 51                                  communication and use of results
   effect of lack of, 38                              in, 49
   international studies and, 83                   objectives for national assessment in,
   long-term government assistance                    20, 21
      and, 79                                      PISA, 120, 125f, 126f
   statistical procedures and, 57–59, 81           policy formation support by national
                                                                            INDEX   |     161




     assessment in, 82                             statistical procedures in, 46, 81
                                                   teacher achievement in, 13, 39, 88
validity, 4, 5, 12, 34, 42, 49, 56, 59, 70, 80
Venezuela, República Bolivariana de, 140,        Western Europe, primary school
      141t, 142t, 143f                               enrollment in, 72
Vietnam                                          World Bank, 90
   case study of, 82, 87–89
   communication and use of results in, 49       Yemen, 72, 112
   examples of questions in national
      assessment of, 10, 11b                     Zambia, 13, 18, 29, 127, 133, 134, 135
   objectives for national assessment            Zanzibar, 17, 127, 134
      in, 17                                     Zimbabwe, 127, 128, 129, 132, 133t,
   performance standards in, 45                      134, 135
   policy formation support by national
      assessment in, 82
                                ECO-AUDIT
              Environmental Benefits Statement
The World Bank is committed to preserving        Saved:
endangered forests and natural resources.        • 6 trees
The Office of the Publisher has chosen to        • 4 million BTUs of
print Assessing National Achievement               total energy
Levels in Education on recycled paper with       • 560 lbs. of net greenhouse
30 percent post-consumer waste, in accord-         gases
ance with the recommended standards for          • 2,324 gallons of
paper usage set by the Green Press Initiative,     wastewater
a nonprofit program supporting publishers in     • 298 lbs. of solid waste
using fiber that is not sourced from endan-
gered forests. For more information, visit
www.greenpressinitiative.org.
National Assessments of Educational Achievement
Effective assessment of the performance of educational systems is a key component in
developing policies to optimize the development of human capital around the world.
The five books in the National Assessments of Educational Achievement series introduce
key concepts in national assessments of student achievement levels, from policy issues
to address when designing and carrying out assessments through test development,
sampling, data cleaning, statistics, report writing, and the use of results to improve
educational quality.




As knowledge increasingly replaces raw materials and labor as a key resource in
economic development, the availability of human knowledge and skills is critical in
determining a country’s rate of economic development and its competitiveness in
international markets. The growing use of national assessment capacity has enabled
ministries of education across the world to describe national levels of learning achieve-
ment in key subject areas; compare achievement levels of key subgroups, such as boys
and girls, ethnic groups, urban and rural students, and public and private school students;
and provide evidence to support claims about standards of student achievement.
   Despite growth in assessment activity, the potential value of the data that assessments
can provide remains underappreciated. The requisite skills to conduct assessments
require development, and even countries that carry out national assessments or partici-
pate in international ones may not yet be able to fully exploit the information yielded.
   Assessing National Achievement Levels in Education describes the purposes and features
of national assessments, as well as issues in designing, implementing, analyzing, and
reporting. It also describes major international, regional, and national assessments in
a range of countries.
   This book will be of interest to national, regional, and state governments; research
institutions; and universities.




                                                                         ISBN 978-0-8213-7258-6




                                                             SKU 17258