r HE VORLD BANK                 EDroloo
Discussion PRper
EDUCATION AND TRAINING SERIES
Report No. EDTIOO
The Specication of Earnings Functions:
Tests and Implications
C. R. S. Dougherty
and
E. Jimenez
June 198'7
Education and Training Department                               Operations Policy Staff
The views presented here are those of the author(s), and they should riot be interpreted as reflecting those of the World Bank.






Discussion Paper
Education and Training Series
Report No. EDT100
The Specification of Earnings Functions:
Tests and Implications
C.R.S. Dougherty and E. Jimenez
June 1987
Research Division
Education and Training Department
The World Eank
The World Bank does not accept responsibilit:y for the views expressed
herein, which are those of the author(s) and should not be attributed to
the World Bank or its affiliated organizations. The findings,
interpretations, and conclusions are the results of research or analysis
supported by the Bank; they do not necessarily represent official policy of
the Bank.
Copyright C 1987 The International Bank for Reconstruction and Development/
The World Bank






ABSTRACT
Many studies of the returns to education have relied on the
Mincerian specification for the earnings function. This study uses data
from a random sample of adult male workers of the 1980 Brazilian census to
test the empirical validity of the assumptions embodied in this
specification, with the following findings: the evidence supports the
assumption that the appropriate regressand Ls the logarithm of earnings,
but it does not support the implicit assumpl:ion that there is no
interaction between the effec:ts of education and work experience, or the
assumption that a single function is appropriate for modelling both early
and mature earnings. We find that the Mincerian specification leads to
upwardly biased estimates of the returns to education, particularly at the
primary level.






I. Introduction
Earnings functions have been widely used to estimate the returns
to education and training -- estimates which have had a significant effect
on the policy debate concern:Lng educational investment. Most studies have
adopted a Mincerian specification in which the core regressors are years of
schooling or schooling dummies, work experience and work experience
squared. This specification has been popular because the coefficient of
the schooling variable can conveniently be interpreted as a crude estimate
of the rate of return to schooling, but it embodies three strong assump-
tions:
(i)   The appropriate definition of the dependent variable
is the logarithm of earnings, as opposed to earnings
as such or any other functional form.
(ii)  There is no interaction between the contributions of
the schooling and work experience variables to earn-
ings.
(iii) A single function can be used to model lifetime earn-
ings, making no distinction between early and mature
labour market experience.
So powerful is the hold of the Mincerian model that these
assumptions are seldom even mentioned. And yet it is commonly accepted
that the earnings functions of those with relatively little education are
much flatter than those with more education, and that many entry-level jobs
-2-



are effectively training slots whose compensation is determined differently
from those of mainstream occupations 1.
1    For discussions of the nature of entry-level jobs and the ear-
nings of those acquiring human capital through on-the-job
training, see Thurow (1980) and Becker (1965).
The first objective of this paper is to subject these assumptions
to overdue tests. The second is to evaluate their practical implications.
Convenience is a legitimate consideration in model specification, for only
an academic purist would argue against the use of a simplified model if it
gave results similar to those derived from more elaborate ones with much
less labor. In the present context, the obvious criterion is the impact of
the assumptions on estimates of rates of return to different levels of
schooling.
The rest of the paper proceeds as follows. In the second section,
we briefly describe the data base, which is a random sample of urban males
in Brazil from the 1980 census. Then, in the third section, we investigate
whether the semi-logarithmic specification of the earnings equation yields
the best fit by testing the explanatory power of alternative transforma-
tions of the dependent variable, the earnings term. We also test whether
the specification conforms with the basic assumptions of the classical
regression model: homoscedasticity, and, for the validity of conventional
tests, normality of the error term. In the fourth section, we investigate
the specification of the right-hand side of the regression model. Our main
concern is the bias caused by neglecting interactive effects between years
of work experience and level of schooling. We also evaluate alternative
- 3 -



measures of the experience term at low levels of schooling and the impact
of certification on the measured returns to schooling and training.
Finally, in the last section of this paper, we examine the sensitivity of
estimated rates of return to the alternative specifications considered in
the paper.
II. Data
The 3 percent national sample for the Brazilian 1980 census covers
3.5 million individuals in 0.81 million households (IGBE 1985).   Out of
this base, and for the purpose of making the statistical analyses more
manageable, a random subsample (stratified by state) of 200 thousand
individuals (in 40 thousand households) was drawn. This subsample was
further refined to include only males aged 15 to 65, living in urban areas
and who reported positive earnings in their main occupation. This resulted
in a total sample size of 22,875 individuals.
The means and standard deviations of the key variables used in the
subsequent analysis are described in Table 1. They are divided by
employment status of the individual. We focus our analysis on the private
sector subsample, since labor earnings for this group are more likely to be
immune from potential biases due to the non-competitive nature of the
public sector and the difficulty in measuring self-employed earnings.
However, where warranted, we discuss differences in the results of the
private versus the other sectors.
-4-



Table 1: Earnings, experience and schooling levels in Brazil
by level of economic activity, 1980
Employed in          Self-
Variable                               Private        Public      Employed
Monthly earnings in cruzeiros (Y)        12865         18470          15123
(means, std deviations)               (16976)       (22917)       (24918)
Experience in years                       16.9          22.2          23.9
(X = min{Age-schooling-6, Age-15})  (11.8)           (12.3)        (13.3)
(means, std deviations)
Level of School Certificate
(proportions, numbers in sample)
None (NIL)                              0.37          0.23          0.49
(5,790)         (490)       (2,565)
Primary lower (PL)                      0.40          0.31          0.36
(6,180)         (666)       (1,889)
Primary upper (PU)                      0.11          0.16          0.07
(1,733)         (341)         (368)
Secondary general (SECG)                0.05          0.10          0.03
(760)         (215)         (150)
Secondary technical (SECT)              0.03          0.05          0.02
(480)          (97)          (83)
Higher Scientific (HISCI)               0.01          0.05          0.01
(223)         (109)          (73)
Higher mgt./agric.(HIMGT)               0.01          0.02          0.00
(194)          (52)          (23)
Higher soc. sci.(HISOC)                 0.01          0.06          0.01
(146)         (137)          (68)
Years of Education
(means, std deviations)
Primary lower (YRSPL)                   3.07          3.46          2.72
(1.46)        (1.20)        (1.61)
Primary upper (YRSPU)                   1.21          2.08          0.80
(1.67)        (1.87)        (1.46)
Secondary (YRSSEC)                      0.44          1.13          0.30
(1.13)        (1.65)        (0.97)
Higher (YRSHI)                          0.18          0.72          0.16
(0.88)        (1.69)        (0.85)
Sample Size                             15,523         2,127         5,225
-5-



In Table 1 we have divided the first eight years of Brazilian
primary education into four years each of :Lower and upper. This is
consistent with the Brazilian educational reform of the early 70's in which
grades 5-8 were redesignated from high school to primary.
Another important issue is the def-inition of work experience.
Although work experience is almost invariably an important variable in
determining earnings - usually the only 2iajor one apart from schooling -
lack of data usually leads to its being estimated by an expression of the
type (age - years of schooling - 6). This procedure can be inappropriate
in developing countries where much of the labour force has had little or no
schooling, for it implies that "work experience" gained during childhood
should be treated on the same level as adult work experience. In the
present study, work experience has been estimated as the smaller of the
above expression and (age - 15), years out of school before the age of 15
not being counted.
Graphically, the effect of the revised definition is to shift the
experience-earnings profiles for those witb. the lowest levels of education
to the left, the shift being greatest for the lowest levels. Those with no
certificate, and hence on average two or three years of schooling, would
have six fewer years.of work experience under the revised definition than
under the traditional one. 'For lower primary the average shift would be
about two years, and higher levels of education would not be affected.
-6-



III. Specification of the dependent variable
The most popular specification of the earnings (y) function is
parabolic, containing schooling (s), experience (x) and experience squared
as explanatory variables. For the ith individual, this can be represented
as:
(1)      ln Yi  =  a + bsi + cxi + dxi2 + Ui
Its popularity stems from Mincer's pioneering work, which showed that this
specification is a good linear approximation of the earnings function
derived from a human capital model, under several simplifying assumptions
about the complex dependence of earnings on schooling and postschool
investments. In this specification, the coefficient of the variable
measuring years of schooling can be interpreted as the private rate of
return to schooling.
A variation on equation (1) has also been widely used since it
allows the estimated rate of return to vary by level of schooling:
(2)      ln Yi  =  a +  E bkDik + cxi + dxi2 + ui
k
where k stands for the level of education (i.e., k = lower and upper
primary, general and technical secondary, and various higher levels). In
this specification, the rate of return to the kth level of education (rk)
has been estimated by comparing the coefficient of Dk with that of Dk-1 and
dividing by the number of years of schooling at the kth level (nk)
(Psacharopoulos, 1981):
(3)         rk  =  (bk - bk-l)/nk
-7-



In order to simplify the interpretation of the coefficients and to focus
attention on the methodological comparisons, most of the analysis is done
on variants of equation (2).
The theoretical foundation for the semi-logarithmic specification
is so widely accepted that it has seldom been subjected to empirical tests.
However, the link between theory and the estimating equation rests on a set
of ingenious but empirically debatable assumptions. As summarized by
Blinder (1976), among the most important are: (i) in the absence of
post-school investments, an individual's age-earnings profile would be flat
and the present discounted value of lifetirie earnings would be the same for
all individuals, regardless of how long they stayed in school; (ii) the
number of years spent at work is independent of the number of years spent
in school; (iii) the return to all post-school investment in human capital
is a constant; and (iv) during schooling, no time is spent in the labor
force, whereas after schooling, everyone works full-time.
Alternative assumptions would result in altered regression
equations, as considered by Mincer (pp.83-S92). For example, the assumption
of a linear decline in post-school investment in human capital over the
life-cycle could lead to an estimating equeLtion that has earnings (instead
of its log) on the left hand side. In this section, we consider the
empirical validity of using the logarithm of earnings as an explanatory
variable.
A. Empirical validity of the semi-log dependent variable:
A general transformation, widely t.sed in the applied economics
literature, is applied to the Brazilian data base to test for alternative
-8-



functional forms. The Box-Cox transformation takes the following general
form:
(4)      Yi{X}  =  a + E bkDik + eXi + dxi2 + Ui,
k
where earnings, Y, is transformed such that:
Yi{X} = (Yi -1)/X for X $ 0,
YiJ{X} = ln (Yi) for X = 0.
The attractive characteristic of (4) is that the functional form
is dictated by the parameter X, which is itself estimated as the value
that maximizes the log-likelihood function. Note that, if the estimated
X = 1, the earnings function is linear in the dependent variable; if
X = 0, the appropriate functional form would be semi-logarithmic, as post-
ulated by Mincer's basic human capital model. Further we can construct a
confidence interval around the estimated value of X to see if alternative
functional forms (transformations) are also consistent with the data. In
our case, we are particularly interested in testing the appropriateness of
the simpler and oft-used functional forms, such as the linear and the
semi-logarithmic.
The estimation of (4) requires the maximization of a nonlinear
likelihood function. It has been shown (Spitzer, 1982) that there are
alternative ways of consistently estimating the parameters through simpler
and available computer algorithms, such as nonlinear least squares or
iterative OLS. In principle, these techniques involve the repeated OLS
estimation of (4) for various values of X. Spitzer and others have shown
-9-



that its maximum likelihood estimate is equivalent to the value for which
the variance of the squared disturbances is minimized 2,
2    To ensure the comparability of the sum of squared errors for
different values of X, the equation can be rendered scale invar-
iant through the use of a scaling trick originally attributed
to Zarembka (1968). The trick is to multiply through (1) by y'
where y' is the geometric mean of y. An ordinary least squares
computer program can then simply be applied to the transformed
version of (1) and modified to repeatedly estimate a* and b*
(where a* = (a - y'{X}yjy', the vector b* = bly', y' = the geom-
etric mean of y) for different X's. The error sum of squares is
computed in each case. We iterate for different values until
the error sum of squares is minimized.
We utilize these techniques, estimate (4) and compare the results with
estimated parameters of the linear and semi-logarithmic specifications.
The value of X for which the error sum of squares is minimized is
-0.13. A 95% confidence interval can be constructed by noting that the
maximum likelihood function is:
(5)    Max ln (X)   - N ln s2/2
where s2 is the maximum likelihood estimate of the variance of
disturbances of the regression and N is the number of observations
(Spitzer, 1986). This formula is used to plot maximized log likelihood
over the whole parameter space and the maximizing X . Large sample theory
can be used to test hypotheses about the parameters. Twice the difference
in the logarithmic likelihood between a null and alternative hypothesis is
distributed x2 with the degrees of freedom depending on the number of parame-
ters specified in the null hypothesis (Zarembka 1968; Heckman and Polacheck
1974).
- 10 -



The plot of the log-likelihood values for different values of
X is shown in Figure 1. According to this figure, the maximum likelihood
value of L for the private sector is -0.16. A 95% confidence region is
around this estimate is -0.20 and -0.12. Thus, the earnings function
specification is significantly different from both the linear and the
semi-logarithmic forms. However, when comparing the two simple specifica-
tions, the semi-logarithmic form dominates the linear version. The large
size of the sample causes asymptotic likelihood ratio tests to reject both
the null hypotheses that X equals one and that it equals zero, but the nega-
tive value for X implies that the x2 test statistic is rejected
at a much higher significance level for the linear model.
3El,   - 
;,.a    -       e F '           EL EL
B'5'E' -           .'S
^~~~~Fgr                  Lo-lkeiho value by lamb,dl
, B a. 3E'    g ,                                       "
-' .8,4
--0. SD  --.0.25    -,. E,     -,15      -0{.10    -U. Db     [,D3
Figure 1: Log-likelihood values by lambda
- 11 -



This is not the first time that Eox-Cox transformations have been
used to test alternative specifications ir. the human capital literature.
Heckman and Polacheck (1974) performed similar experiments with the 1960
and 1970 public use samples of the U.S. census. Their conclusion is
similar to ours -- that among simple transformations, the natural logarithm
of earnings is the correct dependent variable in earnings functions.
B. Homoscedasticity on the schooling dimerLsion
In the basic specification of the model it is assumed that the
disturbance term is homoscedastic with respect to schooling. This can be
checked by disaggregating the sample by level of schooling, running
regressions for earnings with respect to experience and its square for each
category separately, and calculating the mean square residual.
- 12 -



Table 2: Mean Square Error of Residuals by Certificate
and by Experience Category
Dependent Variable
Certificate            Y(x106)        Ln Y
None                      36           0.29
Primary lower             75          0.32
Primary upper            163          0.38
Secondary general        457          0.48
Secondary technical      368           0.45
Higher science          1540           0.39
Higher social sci       1079           0.32
Higher mgt & agric      1234           0.64
Experience
�   10                65          0.24
�   19               215          0.41
Table 2 presents the mean square residual for each category, for
both the semi-logarithmic and linear specifications of the dependent
variable. The linear specification is clearly subject to very severe
heteroscedasticity. Almost inevitably in view of the large size of the
sample, formal F-tests indicate that the heteroscedasticity is still
significant in the semi-logarithmic specification, but it is relatively
mild.
The near-homoscedasticity in the semi-logarithmic version was by
no means a foregone conclusion since there is no theoretical apparatus
predicting it. Indeed it would not have been a surprise to have found
heteroscedasticity so severe that it would have led to the abandonment of
the use of a single, combined earnings function for all levels of
- 13 -



education.
C. Homoscedasticity on the experience dimension
For the purpose of evaluating heteroscedasticity in the experience
dimension, earnings functions were fitted using the subsamples containing
those with the least experience and those with the most experience, the
cut-off points following the guidelines of Goldfeld and Quandt (1965) 3.
3    Those with least experience had 10 or fewer years of schooling
(5704 cases, 37% of the total); those with most experience had
19 or more years of schooling (5695 cases).
Table 2 presents the mean square error for each subsample for
basic regressions using ln y and y as dependent variable. There is
evidence of significant heteroscedasticity in both cases but it is much
less severe for the regression using ln y
D. Normality of the distribution of the residuals
Finally the distribution of the residuals was tested for normal-
ity. The unbiasedness and efficiency of OLS do not depend upon any
assumption concerning the distribution of the disturbance term. Neverthe-
less, in view of the fact that earnings functions seldom account for more
than 50% of the variance in earnings and 1:hat the residual variance is
popularly attributed to a multitude of factors, it is reasonable to expect
the Central Limit Theorem to apply and the disturbance term, and hence
residuals, to approximate a normal distribution. Moreover, the validity of
the t-tests and F-tests depend upon such an approximation.
- 14 -



* *t
*4 *
". *
IC *
.4*.. 4
**** t 
it                                                  44
444                                               44
*44 **
*4 *K   -                                             *4*
49 ~ ~    ~     4
8@e~~3 cas@nes8                                               75 ca@^@oses    4es
$*mfi log&ritbmic                                   Linear
Figure 2:
Distributions of residuals, semilogarltbmic and linear *anings functions,
standardized by division by standard er'ro'r-of tht regression-
- 15 -



Figure 2 presents histograms for the distribution of the residuals
using the semi-logarithmic and linear specifications, both standardized by
division by the standard error of the regression. Both distributions are
significantly different from normal, but that for the semi-logarithmic
regression conforms much more closely than. that for the linear
4
regression, which is far more peaked and long-tailed
4   Adopting the 0.33 standard deviation intervals used in the his-
tograms, and amalgamating into single categories the tails
beyond two standard deviations, the %2 statistics were 79.0
and 8,425 for the semi-logarithmic and linear specifications,
respectively. With 12 degrees of freedom, the critical level
of x2 is 26.2 at the 1% significance level. We are indebted to
J.J. Thomas for proposing this test.
IV. Specification of the expilanatory variables
In this section, we examine the empirical nature of the assump-
tions regarding the right hand side of earnings functions.
A. Interaction of the effects of schooling; and experience on earnings
It is commonly accepted that the age-earnings profiles of those
with the lowest levels of schooling tend to rise relatively slowly after
the first few years of work experience. In the case of unskilled manual
workers, they are likely to reach an absolute plateau and in middle age
begin to fall as physical powers decline. By contrast, the earnings of
those with extended schooling continue to grow throughout their working
lives and the rate of growth is positively correlated with the level of
schooling.
- 16 -



These stylized facts are faithfully reproduced in diagrams
depicting typical earnings profiles by level of education. It is therefore
surprising that they are not imilarly reflected in the specification of
regression models: typically the work experience variable and its square
appear in the regression equation unaccompanied by schooling interactives
and their coefficients are therefore interpreted as applying independently
of schooling level. The equation with interactive terms (which we call the
"basic specification") is:
(6)      ln Yi  =  a  +  E bkDik  +  cxi  +  dxi2
k
+  E ckDikxi  +  E gk Dikxi2  +  ui
k             k
where Ck and gk denote interactive terms.
- 17 -



Table 3: Private Sector Semi-Logarithmic Earnings Functions in Brazil, 1980
Variables      Coefficient  Std. error       Coefficient  Std. error
Constant         8.29016     0.02090            8.01946  0.01434
X                0.04678     0.00199            0.07086  0.00129
X2              -0.00088     0.00004           -0.00126  0.00003
PL               0.05306     0.02771            0.36720  0.01088
PU               0.09512     0.04139            0.69072  0.01619
SECG             0.57692     0.06463            1.18726  0.02266
SECT             0.45868     0.07727            1.16037  0.02781
HISCI            1.91258     0.08984            2.32203  0.03997
HIMGTAG          1.56187     0.13120            2.09362  0.04268
HISOC            1.48525     0.16825            1.72776  0.04894
X*PL             0.02832     0.00286
X*PU             0.05747     0.00469
X*SECG           0.05519     0.00780
X*SECT           0.06266     0.00828
X*HISCI          0.04608     0.01076
X*HIMGT          0.05733     0.01650
X*HISOC          0.00997     0.01828
X2*PL           -0.00044     0.00006
X2*PU           -0.00085     0.00011
X2*SECG         -0.00072     0.00019
X2*SECT         -0.00082     0.00018
X2*HISCI        -0.00087     0.00022
X2*HIMGT        -0.00113     0.00044
X2*HISOC         0.00021     0.00043
---------------------------------------..--------------------__-
R-Squared        0.46664                        0.500
N                15,523                        15,523
Table 3 presents the regressio:n results including and excluding
the interactive terms, respectively. The x-interactives all have the
expected positive sign and those for the two levels of primary education
and the two types of secondary education are all significant at the 1%
level. The x2-interactives are likewise significantly negative for the
lower levels of schooling.
The main consequence of omitting the interactives is to overesti-
mate the initial upward shift of earnings profile associated with progres-
- 18 -



sively greater amounts of educatton. In the case of lower primary
education, for example, the full specification suggests that the initial
shift is a modest 5.3%, but this of course increases over time since the
earnings of those with lower primary education grow faster than those with
no certificate: by the twentieth year of work experience, mid-way through
the individual's working life, the differential would be 55%. The
specification without interactives, constrained to yield an average figure,
suggests that lower primary education results in a once-and-for-all
relative shift of 44%, effective immediately.5.
5    Throughout the text we calculate income differentials by
comparing the absolute earnings predicted by the logarithmic
functions. By way of illustration, the coefficient of lower
primary in the interactiveless specification, 0.3672, implies
that the earnings of lower primary graduates are higher than
the earnings of those with no certificate by a factor
exp(0.3672), that is, 1.44, implying a differential of 44%.
Similar remarks apply to the estimates of the impact of other
levels of education. The implications of this distortion for
rate-of-return analysis are obvious. By exaggerating the initial impact of
education, the interactive-less specification will systematically tend to
lead to overestimates of the rate of return to it. This point is explored
further in Section V.
B. Modelling early labour market experience
The literature on occupational training suggests that early labour
market experience differs from later experience in two respects: (i) the
first few years of labour market experience are a time for experiment and
- 19 -



for testing the job market, leading to relatively frequent job change
(Grasso and Shea 1979); (ii) a major characteristic of many entry-level
jobs is their training furLction (Thurow, 1980). As a consequence it is
commonly accepted that the earnings of many individuals rise relatively
rapidly, in proportional terms at least, in their first few years in the
labour force, and then settle down to a more sedate rate of growth.
- 20 -



Table 4: Private Sector Semi-Logarithmic Earnings Functions in Brazil, 1980
Variations on Basic Specification
For X = Min{Age-school-6, Age-15}           For X = Age-school-6
For X < 10          For X > 10
Variables  Coeff. Std.err.        Coeff.  Std.err.          Coeff.  Std.err.
Constant    8.07407  0.03992       8.60137  0.05782        7.92317      0.03768
X           0.10871  0.01665       0.02409  0.00429        0.05620      0.00254
X2         -0.00457  0.00151      -0.00052  0.00007       -0.00081      0.00004
PL          0.12301  0.05195       0.09487  0.08248        0.06962      0.04705
PU          0.09761  0.08487       0.16167  0.13092        0.32979      0.05635
SECG        0.93556  0.15774       0.44652  0.20149        0.94391      0.07218
SECT        0.53746  0.24680       0.07684  0.20474        0.82567      0.08383
HISCI       1.80528  0.22592       1.78286  0.32872        2.27956      0.09568
HIMGTAG     1.41521  0.26443       1.60298  0.40962        1.92886      0.13571
HISOC       1.67729  0.51904       2.52011  0.45173        1.85224      0.17223
X*PL        0.00240  0.02147       0.02240  0.00644        0.02639      0.00353
X*PU        0.05355  0.03228       0.04847  0.01079        0.04939      0.00519
X*SECG     -0.08072  0.05417       0.06185  0.01773        0.04576      0.00800
X*SECT      0.01682  0.08607       0.08881  0.01668        0.05323      0.00848
X*HISCI     0.08434  0.08383       0.05219  0.02607        0.03665      0.01094
X*HIMGT     0.09340  0.09474       0.04342  0.03916        0.04791      0.01668
X*HISOC    -0.04945  0.17580      -0.07702  0.03979        0.00054      0.01847
X2*PL       0.00158  0.00194      -0.00030  0.00011       -0.00041      0.00006
X2*PU      -0.00053  0.00278      -0.00065  0.00020       -0.00084      0.00010
X2*SECG     0.00992  0.00434      -0.00077  0.00035       -0.00079      0.00019
X2*SECT     0.00395  0.00687      -0.00122  0.00030       -0.00090      0.00018
X2*HISCI   -0.00349  0.00701      -0.00096  0.00042       -0.00095      0.00022
X2*HIMGT   -0.00274  0.00770      -0.00070  0.00084       -0.00120      0.00044
X2*HISOC    0.00145  0.01342       0.00182  0.00078        0.00014      0.00043
-------------------------------------------------------------__-------------
R-Squared   0.51671                0.40949                 0.45956
This stylized fact is also neglected in the econometric litera-
ture. The left-hand and middle double columns of Table 4 show the results
of splitting the Brazilian sample into those who had no more than 10 years
of work experience, and more than 10, respectively 6. The F-statistic for
- 21 -



6    Splits were evaluated at 6, 8, 10 and 12 years of work experi-
ence, the results suggesting that the discontinuity is most
distinct at 10. Similar splits were evaluated for each level
of education separately, with the firnding that there is a posi-
tive correlation between the level of' education and the length
of the first phase of labour market experience.
the Chow test for the split is 6.72, significant at the 0.1% level.
Table 5: Rate of Growth of Private Sector Earnings by Schooling Level
And by Experience Level, Brazil, 1980
Schooling  Basic Specification       Split Specification
Level        X = 0     X = 20         X = 0    X = 20
None          4.7        2.9          10.9       1.4
PL           7.5         4.9          11.1       3.0
PU          10.4         7.0          16.2       4.9
SECG        10.2         7.0           2.7       6.0
SECT        10.9         7.5          12.6       7.8
HISCI        9.3         5.8          19.3       4.7
HIMGT        10.4        6.4          20.2       4.3
HISOC        5.7         4.3          27.7      -3.1
Table 5 summarizes the rates of growth of earnings by educational
level for x equal to zero and 20 predicted by the basic specification and
the split function. With the exception oid secondary general education, it
can be seen that the rates of growth of earnings are indeed initially
greater, and later smaller, than suggested by the basic specification in
Table 3. Again, there are obvious implications for the estimation of rates
of return and they are discussed in Section V.
- 22 -



C. Estimation of work experience
The traditional definition for estimating years of work experi-
ence, (age - years of schooling - 6), makes no distinction between "work
experience" acquired as a child and adult work experience. We have
discounted the former by using instead the expression min {age - years of
schooling - 6), (age - 15)}. The effect of using the traditional
expression is to shift the experience-earnings functions for the affected
categories, those with no certificate and lower primary graduates, to the
right. Since these profiles are parabolas, the effort of shifting them to
the right is to lower their intercepts and thus to increase the difference
between the initial predicted earnings of these categories and first
relatively unaffected category, upper primary graduates. The result is to
overestimate the rate of return to upper primary education.
This point is illustrated by the last two columns of Table 4 which
present the results of using the traditional expression in our basic model
specification. The intercepts for those with no certificate, lower primary
and upper primary education are 7.92, 7.99 and 8.25; the corresponding
estimates using the modified expression for work experience (Table 3, first
two columns) are 8.29, 8.34 and 8.39, respectively. The remaining
intercepts are virtually unaffected.
If early labour market experience is modelled separately from
mature experience, the distortions caused by using the traditional
definition are even more pronounced: by causing nearly all workers to fall
into the mature subset, it almost precludes any serious attempt to model
early labour market experience for the lowest levels of education even when
the split is made as late as ten years of experience.
- 23 -



D. Certification effects
The basic specification of the regression model makes no distinc-
tion between the effect of years of schooling by level on earnings and the
effect of obtaining the corresponding certificate. The second double
column of Table 6 presents the results obtained when these effects are
separated, and for purposes of comparison the first double column presents
the results obtained when certificates are omitted. The first double
column can be regarded as the counterpart of the first double column of
Table 3 when schooling is treated as a splined continuous variable instead
of as a set of dummy variables7.
7    This specification is essentially a variation of the original
Mincerian model in which years of schooling is treated as a
single, continuous variable. The Mincerian specification
embodies the assumption that all years of schooling make the
same proportional contribution to earnings. The splined ver-
sion allows the contribution of each year of schooling to vary
according to educational level. A second difference in the
version presented here is that the years of schooling variables
are accompanied by interactive terms with experience and its
square.
- 24 -



Table 6: Private Sector Earnings Functions, Brazil, 1980
Certification Effects
Variables    Coeff.    Std. err.        Coeff.    Std. err.
Constant      8.26093  0.03024             8.28086  0.03040
X             0.03313  0.00275             0.03310  0.00274
X2           -0.00061  0.00005            -0.00062  0.00005
YRSPL         0.01633  0.00968            -0.00668  0.01071
YRSPU          0.00593  0.01012           -0.01400  0.01167
YRSSEC        0.11386  0.01817             0.06380  0.02392
YRSHI         0.28433  0.01861             0.25418  0.02683
XYRSPL        0.00909  0.00093             0.00915  0.00093
XYRSPU         0.00791  0.00115            0.00775  0.00115
XYRSSEC       0.00316  0.00213             0.00309  0.00212
XYRSHI       -0.00815  0.00214            -0.00484  0.00218
X2YRSPL      -0.00015  0.00002            -0.00015  0.00002
X2YRSPU      -0.00011  0.00003            -0.00011  0.00003
X2YRSEC      -0.00002  0.00005            -0.00002  0.00005
X2YRSHI       0.00007  0.00005             0.00001  0.00005
PL                                         0.09713  0.01781
PU                                         0.17896  0.03623
SECG                                       0.34213  0.06591
SECT                                       0.31440  0.06774
HISCI                                      0.56095  0.11436
HIMGT                                      0.42733  0.10700
HISOC                                      0.01865  0.11157
R-Squared     0.48034                      0.48452
The coefficients of the certificate dummies all have the expected
positive sign and, in spite of the problem of multicollinearity, the
majority are significantly different form zero at the 1% level.
Although these results suggest that employers are affected by
credentialism in their wage-setting, this is not the only possible
interpretation. An alternative possibility is that those who complete each
level of education are intrinsically more able than those who do not. A
further, related, explanation is that those who complete each level extract
more from it than those who drop out and presumably were struggling. The
- 25 -



certificate coefficients therefore may equally be regarded as evidence of
credentialism, of screening for ability, or of a true educational effect.
These results are similar to those found recently for US data by Hungerford
and Solon (1987).
V. The Sensitivity of Rates oi Return to Specification
In this section we compare several methods of computing the rates
of return to various levels of education. When interactive experience-
schooling effects are introduced into the regression model, the earnings
functions by educational level cease to be isomorphic and it is no longer
possible to read off a crude estimate of t:he rate of return in the
Mincerian manner. One is forced to return to the more laborious but
theoretically more satisfactory procedure of calculating earnings streams
explicitly and using an iterative procedure to calculate an internal rate
of return. This complicates the comparison of the rates of return using
the standard Mincerian model and more sophisticated ones. In part the
discrepancies are attributable to the use of the short-cut technique in the
Mincerian estimates. In part Mincerian and more elaborate methods are
different because they employ different specifications of the earnings
function.
A. Short-cut and IRR versions of the Mincerian model
In the short-cut version of the Mincerian model the rate of
return to education is estimated directly from the regression results.
When schooling is treated as a continuous, cardinal variable, its
coefficient is the estimate of the rate of return. In the dummy variable
- 26 -



approach used here, the rate of return to each level is calculated from the
coefficients of the schooling dummies using equation (3). This procedure
involves three assumptions:
1. Direct costs are negligible, or are offset by a student's
part-time and summer earnings.
2. The opportunity cost of foregone earnings is equal to the
earnings of the next lower level predicted by the model.
3. The earnings profiles are isomorphic, that is, they are of the
form yo f(x), where yo are the initial earnings of the
educational category in question and f(x) is a multiplicative
experience function common to all educational levels�.
8   A proof of the validity of the short-cut method, subject to these
assumptions, is provided in the appendix.
In the more satisfactory, but more laborious, internal rate of
return (IRR) version, account is taken of direct costs9 and the foregone
earnings of children are treated more appropriately. The third assumption
is maintained.
9    A recent monograph by Winkler (1986) yields the following costs
(average of federal state and municipal levels) of education:
1980 Cruzeiros
Primary overall                       811
Lower                   759
Upper                   949
Secondary overall                   1,316
General                 995
Technical              1,742
Higher overall                     13,842
Sciences              15,820
Management            11,865
Social Sciences       11,865
Winkler's data allows us to compute unit costs overall for each
educational level. To assign these costs for the subheadings
within levels, we assumed that unit costs at upper primary
exceed those at lower primary by 25%; secondary technical
- 27 -



exceeded unit cost of secondary general by 75%; and unit cost
for science to exceed those for management and social sciences
by 25%.
Table 7: Rates of Return under Alternative Specifications
Brazil 1980 Private Sector by Schooling Level
(percentages)
Primary     Secondary         Higher
Specification                 Lower  Upper   Gen  Tech   Sci  Mgt/Ag  Soc
Mincerian
1.  Coefficient-difference       9     8       12    12    28    23      14
method
2.  IRR method                  38    35       14    12    21    18      10
With experience interactions
3.  Basic                       24    25       12    10    20    17       9
4.  With experience spline      25    24       12    10    20    16       8
5. With conventional            22    31       14    12    20    17       9
experience measure
We are concerned only with the effect of the third assumption
after the first two have been relaxed. However since the majority of
Mincerian studies use the short-cut model, we begin by comparing the
results obtained using it with those obtained using the IRR version and the
Brazilian data. The rates of return for the different levels of education
using the short-cut and IRR methods are presented in the first two lines of
Table 7. The inclusion of direct costs causes the estimates of the rate of
return to secondary and higher education to be lower in the IRR version
- 28 -



than in the short-cut version. However in the case of lower and primary
education the assumption that foregone earnings are in fact negligible
causes the IRR estimates to be substantially higher.
B. Mincerian versus specification with interactives
Next we compare the rates of return using the IRR version of the
Mincerian model with those obtained using its counterpart including
interactive experience-education terms (line 3 of Table 7). The greatest
impact is on the estimates of the rates of return to lower and
upper primary education, which are significantly lower in the interactive
version, the reason being that the Mincerian specification causes the
initial impact of these levels of education to be overestimated.
C. Variations on the specification with interactives
Finally we evaluate the effect of using the earnings profiles
splined by early and later work experience discussed in Section IV.B, and
the effect of the revised measure of experience discussed in Section IV.C.
Despite the significant Chow tests, the introduction of splines appear to
have a negligible effect on the estimates of rates of return (Table 7, line
4) and clearly in this case was a refinement of secondary importance. The
measurement of the work experience variable is however a more significant
issue: the conventional method gives rise to a substantial upward bias in
the estimate of the rate of return to upper primary education, and, to a
lesser extent, in the estimates for lower primary and secondary education
(Table 7, line 5).
- 29 -



D. Private and social rates of return
The short-cut version of the Mincerian model is sometimes
described as yielding estimates of the private rate of return, while the
IRR version (and the interactive models discussed here) are described as
yielding social estimates. We are agnostic on these interpretations in
this paper because we are chiefly concerned about the magnitude of the
impact of alternative methodologies in computing rates of return.
Moreover, we have no information on the private direct costs of education
(uniforms, transport, charges for text-bocks, exercise books, pencils and
other materials) and the effects of direct taxation.
VI. Public Sector Emplo ment and Self-Employment
In addition to the data on private sector employees, the sample
contained smaller data sets on public sector employees and the self-
employed (see Table 1).
A regression using the basic specification with interactive
variables (equation 6) and the public sector data yielded a better fit (R2
equal to 0.57 for the public sector, 0.47 for the private sector) and
similar coefficients 10 (Table 8).
10   A Chow test indicated that the fits were nevertheless signifi-
cantly different. Half of the discrepancy between the residual
sum of squares for the pooled and separate regressions could be
accounted for by a simple sector dumny, but the difference
remained significant after its inclusion.
A similar regression for the self-employed yielded, as antici-
pated, a much inferior fit (R2 equal to 0.25). The intercept dummies were,
- 30 -



with the exception of higher education, science, considerably larger than
those for the private sector sample and the experience-education interac-
tives were, with the same exception, not significantly different from zero
(Table 8). yhe F-statistic for the explanatory power of the interactive
variables as a group is 1.68, just significant at the 5% level. It follows
that for this subsample the traditional Mincerian specification would have
been approximately appropriate.
Table 8: Public and Own Account Sectors
Semi-Logarithmic Earnings Functions in Brazil, 1980
Variations on Basic Specification
Public Sector           Own Account
Variables    Coeff.  Std.err.      Coeff.  Std.err.
Constant     7.9779   0.1222         8.2825   0.0549
X            0.0540   0.0089         0.0510   0.0042
X2          -0.0008   0.0002        -0.0009   0.0001
PL           0.1220   0.1470         0.3658   0.0787
PU           0.1047   0.1613         0.7106   0.1361
SECG         0.7154   0.1856         0.7594   0.2110
SECT         0.4859   0.2377         1.2039   0.3172
HISCI        1.7857   0.1986          1.7278   0.2404
HIMGTAG      1.2418   0.3725          1.7404   0.4911
HISOC        1.6660   0.2572          1.7772   0.3857
X*PL         0.0235   0.0116         0.0098   0.0064
X*PU         0.0483   0.0140        -0.0042   0.0126
X*SECG       0.0270   0.0166         0.0238   0.0225
X*SECT       0.0342   0.0221        -0.0275   0.0293
X*HISCI      0.0311   0.0201         0.0561   0.0260
X*HIMGT      0.0471   0.0407        -0.0234   0.0604
X*HISOC      0.0219   0.0224         0.0017   0.0375
X2*PL       -0.0003   0.0002        -0.0002   0.0001
X2*PU       -0.0006   0.0003         0.0002   0.0002
X2*SECG     -0.0002   0.0003        -0.0003   0.0005
X2*SECT     -0.0005   0.0004         0.0006   0.0006
X2*HISCI    -0.0005   0.0004        -0.0016   0.0006
X2*HIMGT    -0.0005   0.0010         0.0003   0.0015
X2*HISOC    -0.0003   0.0004        -0.0004   0.0008
--------------____-----------------------------------
R-Squared    0.5685                  0.2549
- 31 -



The earnings function specification described in Section IV.D was
used to detect certification effects. As anticipated, they were
significant at all levels for the public sector and stronger than in the
case of the private sector. Again, as anl:icipated, they were largely
absent in the self-employed subsample. Only that for lower primary was
significantly different from zero. As noted in Section IV.D, there are
several possible explanations of certification effects. These findings
support the traditional sheep-skin explanation for all but lower primary.
For this, the explanation that completers are inherently different from
non-completers may be more appropriate.
- 32 .



VI. Conclusions
The empirical results provide striking confirmation of the
superiority of the semi-logarithmic earnings function over its linear
counterpart. The semi-logarithmic version is supported by the Box-Cox
transformation, by relative homoscedasticity in both the schooling and work
experience dimensions, and by the relatively normal distribution of the
residuals.
However they do indicate that the standard Mincerian model errs in
neglecting interactive effects between work experience and schooling and by
not making a distinction between the modelling of initial and later
earnings.
The Brazilian results suggest that the biggest differences in the
contribution of work experience to the growth of earnings occur at the
lowest levels of education, and hence that the standard specification is
likely to overestimate the rate of return at these levels, a bias which is
likely to be aggravated by the traditional method of estimating work
experience. This conclusion must however be tempered by the finding that
the interactive effects appear to be confined primarily to mature earnings
and hence will have greatest impact when the rate of return is low.
- 33 -



References
Becker, G. (1972) Human Capital, New York: NEBR.
Blinder, A. (1976) "On Dogmatism in Human Capital Theory," Journal
of Human Resources 21, 8-22.
Goldfeld, S.M., and R.E. Quandt (1965) Some tests for
homoscedasticity, Journal of the Anierican Statistical
Association 60, 539-547.
Grasso, J., and J. Shea (1979) Vocational Education and Training:
Impact on Youth, Berkeley: Carnegie Council on Policy Studies
in Higher Education.
Heckman, J. and S. Polachek (1974) "Empirical Evidence on the
Functional Form of the Earnings-Schooling Relationship,"
Journal of the American Statistical Association, 69, 350-54.
Hungerford, T. and G. Solon (1987) "Sheepskin Effects and the
Returns to Education," Review of Economics and Statistics,
175-177.
Mincer, J. (1974) Schooling, Experience and Earnings, New York:
Columbia University Press.
Psacharopoulos, G. (1980) "Returns to Education: An Updated
International Comparison," in T. King (ed.) Education and
Income, World Bank Staff Working Paper No. 402.
Spitzer, J. (1982) "A Primer on Box-Co:c Estimation," Review of
Economics and Statistics, May, 307-313.
Thurow, L.C. (1980) The Zero-Sum Society, New York: Basic Books.
Winkler, D. (1986) Primary Education in Brazil, Washington, D.C.,
World Bank.
Zarembka, P. (1968) "Functional Form in the Demand for Money,"
Journal of the American Statistical Association, 63, 502-511.
r
- 34 -






Appendix
The Short-Cut (Coefficient-Difference) Mincerian Method
*               The short-cut Mincerian method has been adopted in many empirical
studies and its use is explained in Psacharopoulos (1981), but we have not
been able to locate a formal justification. The proof which we provide is
subject to the assumptions listed in Section V, all of which are controver-
sial.
We assume that the education in question takes T years, that the
initial earnings of the uneducated and educated are yo and y1, that
earnings with x years of work experience are yo f(x) and y1 f(x), and that
the lengths of their working lives are No and N1 years, respectively. For
an individual who has the choice of being educated or entering the labour
force directly, the present discounted valueas of the alternative earnings
streams are
No
f    yO f(x) e rx dx          (direct antry)
N1
I    Y1 f(x) er(x+T) dx       (entry aEter education)
where r is the rate of discount. The rate of return to education is thus
given by the solution in r to
No                           N1
yo fI  f(x) e-rx dx  =  y1 e_rT f  f(x) e-rx dx
- 35 -



Hence, provided that the difference between the integrals is
negligible, the rate of return is given by (log y1 - log yo)/T. Since the
regression is run in logarithmic form, this amounts to dividing the
coefficient of the education dummy by T. When there are several levels of
education, the dividend is the difference between the coefficients of the
relevant schooling dummies.
The difference between the integrals may not be negligible if work
experience has a stronger impact on earnings than education, for example,
if f (x) = eax where a is greater than (log yl - log yo)/T. Note that the
proof does not depend upon the earnings functions being parallel in any
simple sense: it is sufficient that they be isomorphic.
- 36 -