International Bank f o r Reconetruction and Development Development Research Center Diecuesion Papers No. 10 EFFICIENT ESTIMTi3N OF THE LORENZ CURVE .WD ASSOCIATED INEQUALITY MEASURES FROM GROUPED OBSET;'?ATIONS N.C. Kakwani and N. Podder October 1974 NOTE:Discussion Papers a r e preliminary materials circulated t o stimulate discuaeion and c r i t i c a l comment. References i n public&tion t o D i s - cusaion Papere should be cleared with the author(e) to protect t h e tentative character of these papers. EFFICIENT ESTLXATION OF THE LORENZ CURVE AM) ASSOCIATED INEQUALITY MEASURES FROM GROUPED OBSERVATIONS - by N.C. Kalmani and N. Podder Development Research Center, The World Sank The University of New South Wales I. INTRODUCTION The Lorenz curve is widely used t o represent and analyee the oize distribution of income and wealth. The curve relates the crmnrlativla propor- tion of lncome ur,ite t o the cumulative proportion of income received when unite a r e arranged i n ascending order of their income. The equation of the Lorenz cu:rva can be derived from the density funcrion of the inccme distribution. I n practice, the d e m i t y function ie not ham and eo far the approach Fa8 been t o f i t some re11 known denaitp function, for example, the Pareto or the Lognormal. The shortcoming of such nn approach is that the well-known density function hardly givks a reasonably good f i t t o actual data. An alternative approach i e t o find an equation o t the Lorenz curve which would f i t actual data reasonably well. 'f The Lorenz-curve has a number of properties which can be effectively - - * ucilized t e s p e c i f y such an equation. S - m ' R e purpose of t h i s paper i e t o introduce a new coordinate system For rlle I.orenz curve. P a r t i c u l a r a t t e n t i o n ie paid t o a s p e c i a l case of wide empirical v a l i d i t y . Four a l t e r n a t i v e methods have been used t o estimate t h e proposed Lorenz curve from the grouped observations. The well- 'mown inequ3lity measures a r e obtained as the function of t h e estinated paraxr.eters of the Lorenz curve. The procedure of estimating the asymptotic standard e r r o r s of the inequality neaeures is aLso provided. I n addition the frequency d i s r r i b u t i o n is derived from t h e equation of t h e Lorenz curve. X new representation of t h e Lorenz curve is introduced i n the next section. Section 3 provides the relationship between t h i s representa- t i c n of the Lorenz curve and a number of conventional measures of income inequality. Section 4 describes a number of estimation methoda. The l a a t section r e p o r t s some empirical r e s u l t s based on the data from the Australian survey of Conaumer Expenditure and Finances (1967-68). 2. A NEW CO-ORDINATE SYSTEM FQR,THE LOkENZ CURVE I Suppose t h a t income X of a family is a random v a r i a b l e with probability d i s t r i b u t i o n function F(x). Further, I f lit is assurcsd that * aeon v of the d i s t r i b u t i o n e x i r t o and X is defined only for poeitive value&', the f i r s t moment d i s t r i b u t i o q f u n c c i o n of X is then given by where g ( X ) is the density function. 11 The incoce X can be negative f o r some families but is assuned t o be always p o s i t i v e here because of notational convenience. - -3 The Lorenz curve is the relationehip betveen F(x) and Fl(x). - The curve i e shovn i n Figure 1. The equetion of the l i n e F1 F ia called t h e e g a l i t a r i a n l i n e , which in the diagram, is t h e diagonal through the o r i g i n of t h e u n i t equare. , L t Let P be any point on the curve with co-ordinates (F,FL). - - 1 '9 = - 4(F + F1) and n fi(F - F1). - * * # .. then rl w i l i be the length of the ordinate from P on the e g a l i t a r i a n l i n e and will be the dietance of the ordinate from the o r i g i n along the egalitarian l i n e . Since t h e Lorenz curves l i e s below t h e e g a l i t a r i a n l i n e , Fl -< F which implies n 0 Further, i f income is always poeicive the equation (2.2) w i l l imply n t o be l e s s than o r equal The equation of the Lorenz curve i n terms of 7 and n can now be written as: where a v a r i e s from zero t o fi. I f g(X) is continuous, the derivatives of F(x) and F ~ ( X ) e x i s t ; d~ - ,Usingthesevaluesin(2.2) I dF1 g(x) and :ix dx U eivcs the d ~ r i v a t i v e sof n with respect t o a as: - Thus . q w i l l be maximum a t x v If the Lorenz curve represented by the equation (2.3) is symmetric,-21 .. -should thc vnlue of a t n and ( - 7) be equal Eor a l l values of ?, which implies .I -21 The s ~ e t r i c i t yof the Lorenz curve is defined with respect t o the diagonal drawn perpendicular t o the egalitarian l i n e . - 5 - - f ( n ) f ( f i - n) f o r a l l n The curve w i l l be skeved toward8 (1, 1 ) i f £(a) > f ( f i - n) f o r 1 < fi and it w i l l b e skewed tcrlards (0, 0) i f f (a) < (fi- a) f o r 1 ;r < fi . f o r instance, assume t h a t the equation of t h e curve is n = ana(fi - a)', a > 0 , a > 0 and 0 > 0. The r c ~ t r i c t i o n a > 0 implies t h a t n > 0 i.e. the Lorenz curve - lies below the egalitarian line. Further, a > 0 and 0 > 0 mean t h a t s - - fi assuaes value zero when a 0 o r r . Using (2.6) it is seen t h a t t h e curve is symmetric i f a = B, skewed towards (1, 1) i f 0 > a and skewed tounrds (0, 0 ) otherwise. Further r e s t r i c t i o n s on the c o e f f i c i e n t s of (2.7) cnn be imposed on the basis of equations (2.4) and (2.5). I f f l ( n ) stands For the f i r s t d e r i v a t i v e of f (r) with respect t o r, t h e equation (2.4) i t . p l i e s t h a t Cot X > 0, [ l - - £'(a)] and [ l + f ' ( n ) ] should be of the eame s i g n $0 t h a t t h e i r r a t i o is always positive. The equation (2.5) means t h a t for a l l v ~ l u e sof X, the second derivative £"(a) should be negative. F o r c h i equation (2.7). these t h i e e q u a n t i t i e s a r e obtained a s s 1 ( T I ) + F an = 1 ) + 1 - - I + (2.9) (fi- a) ?r and where uea has been made of (2.2). It is thus obvious that the sufficient I I conditiona f o r equatione (2.4) and (2.5) t o be s a t i s f i e d a r e 0 < a 51' and 0 < 6 5 1. These s u f f i c i e n t conditions r u l e out p o s s i b i l i t y of points of inflexion on t h e curve which are, of course, not p e d s s a b l e i n t h e , Lorenz curve. Aa a l t e r n a t i v e c l a a s of equatione of t h e Lorenz curve which ~ look s i m i l a r t o t h e well-known CES production function proposed by &r& Chenery, Minhaa and Solow [ I ] i a given b;* vhcre t h e parameters a , 6, p and v a r e a l l g r e a t e r than zero. ~ e a r r a d ~ i n ~ I t h e equation (2.11) we obtain - I which =].early ohowa t h a t n a~~sumeevalue zero when n 0 and n = I h e curve is s p m e t r i c i f 6 -+, skewed towards (1. 1 ) i f 6 > 12 and ~ I~ skewed towards (0, 0) i f 6 < f: kurther, t h e l i m i t of ( 2 . l i ) a s p I approaches zero becomes - e - I is the same c h s s of equatione a s (2.7) with a = 6v and 8 - v ( l 6 ) . 1 Finally, the s u f f i c i e n t conditions t h a t the equations (2.4) and (2.5) a r e always s a t i s f i e d for t h i a c l a a s of equatione 1.e. (2.11) a r e 0 < 6 < 1and The income density function undrrlying t h e Lorenz cunrr (2.3) is obtained as xhere use has been made of equations (2.2) and (2.4). The equation (2.4) w r i t t e n he v - x f ' ( n ) - U - x , (2.15) gives the relationehip between n and x. Under the e u f f i c i e n t condi- t i o n s diacueaed above, f"(n) < 0, which impliee t h a t f'(n) is a mno- tonically decreasing function of n and, therefore, the equation (2.16) can alwaye be solved f o r n i n terms of x. Substituting t h e value of n f o r n given value of x i n (2.2) gives the d i s t r i b u t i o n functions P(x) and Fy(x). Differentiating (2.15) with respect t o n gives * I t da which impliee t h a t > 0 i . e . n increasee ae x increasee. Uaing the value of. n solved from (2.15) i n t o (2.16) ; obtain t h e value of dn we - dx .. 'J i ' in terms of x, which on s u b s t i t u t i n g i n ( 2 . 1 4 t g i v e s the deneity f u n c t i m . * b g ( x ) Thue, i f the condition t h a t f " (7) < 0 3s satitlfied f o r t h e given e q u a t i o n for the Lorenz curve i t is alwaye possible t o derive t h e income density function underlying tyie equation of the Lorenz curve. 3. INEQUALITY MEASURES AND THEIR DERIVATION Anor,g a l l the measures, t h e w e t wideiy used is Gini's concentration r a t i o which is equal t o twice t h s a r e a between the Lorenz curve and t h e ega- l i t a r i a n l i n e . Thus i f t h e ~ o r e n zcurve is fom.ulated i n terms of IT and 0 , the concentration r a t i o becomes K C R d 3 1 f ( n ) d ~ (3.1) 0 which f o r t h e s p e c i f i c curve (2.7) is where B(l+a, 1+B) is the Beta function which has been widely tabulated.-3 / The p a r t i a l d e r i v a t i v e of CR with respect t o a, a and 0 are evaluated a s a (CR) -CR I, (3.3) aa a = [lug If-+ Y(1 + 0) - Y(?+ ca + B)](CR) 3 0 where Y(l + a ) is the Enter's p s i function which can ?-s numerically comp6ted 'C L by making uqe of the following relationship.- - 4 / C i See Pearoon and Johnson [91. 4' In order t o find t h e derivatives and , ve require the aa as p a r t i a l derivativea of B(l+a,l+B) with respect t o a and 0 . Formula 4-2531 of Gradshteyn and Ryekik [4] is used t o evaluate the i n t e g r a l obtained a f t e r d i f f e r e n t i a t i n g p a r t i a l l y the Beta function. Using these p a r t i a l derivative, the aeymptotic variance of CR can nov be obtained from the estimated variances and covari.ances or' the pata- 51 C ~ e t e restizzatee a, a and 0.- Another important measure of inequality vhich is well knovn in the - l i t e r a t u r e is relative mean devbtion. This meaoure is defined a8 whore xi l a the income of the i t h family. It can be a h d l that T 1. aqua1 Lo the -bun discrepancy bet- vcen F ( x ) and El (x), which is also equal t o & t h e e the m u W value of s. In order to obtain the maximum value of rl, equation (2.3) is to be differen- tiated with reepect t o n and equated t o zero. Than solving for n, the maxiram value of n can be obtained from the equation of the Lorenz curva. For instauce, if the equation of the Lorenz cr3rveir (2.7). equating its derivative to zero, rm obtain - o 3 a a ~ - l ( ~n)'- - - IL n6na(h nlB-' (3-8) d r c - 4% .dhich givae ,n and, tharafora, t h e relative mean deviation vill be a + 6 -' 5 See Xakwani and Poddar 161. 61 c. f . Gestwlrch [3]. Again iCt h e variances and covariances of 8 6 and 0 a r e known, it 19 poseible t o compute the asymptotic vaironcc of T. E l t e t o and Frigyee [2] have recently proposed a e e t of three new Fneqcality measures which can b e e a s i l y cornputea from t h e equation of t h e - Lorenz curve (2.3) bv c s i n g t h e value of n a t which q is maximum. It is thuo obvious t h a t t h e d e r i v a t i o n of E l t e t o and Frigyes 121 measures a r e y i n i l a r t o t h e r e l a t i v e mean deviation. Recently, Kondor [7] has shown t h a t thecie measures do not corlvey much more information than t h e r e l a t i v e ncnn cic?vi.ition and it is, therefore, unnecessary t o discuss t h e i r derivation here. However, t h e numerical valuee of t h e s e measures d o n 3 with t h e i r asymp- t o t i c standard e r r o r s have t e e n computed using Australian data i n Section 5. Further, t h e estimated Lorenz curve (2-3) can be used t o obtain zny p e r c e n t i l e of t h e d i s t r i b u t i o n . To i l l u e t r e t e t h i s point t h e estimated shares of income going t o t h e poorest and r i c h e s t 5 and 15%have been compu- ted i n Section 5. 4. ON THE ESTIMATION OF 'IXE LOREN2 CURVE The estimation of t h e Lorenz curve from grouped observations is con- .;idered here. Suppose t h e r e a r e N families which have been grouped i n t o * 3 I (T+1) income c l a s s e s , v i z . , (0 t o xl), (xl t o x2) ,...,(xT t o x ~ + ~ )Let nt . be the number of families earning income i n t h e i n t e r v a l - and xt, then x ~ , ~ - - 5 '9 n - -& is the relative frequency f t is e consistent estimator of the ,t N t ~ o b a b i l i t y 4t of a family belonging t o t h e t- th income group.ll .* I f x: is the sample mean for the t- th incone group, then the consfs- t e n t estimates of F(x ) and Fl(x ) a r e t t - W L I respectively. vherc t-1.2,. ...T and Q 1 $f i a the nesn +come of Y Y Y-1 I a l l the families. Now using the equation (2,2), the consistent estimatota of nt and qt a r e obtained a s B, + Q Pp - 4, and yt I t B 4- d ' r P reepectively. (rt and yt d i f f e r from nt snd qt by soma random d sturbance I t e r n . ) Then the equation of the Loranz curve (2.7) i n term8 of he observa- tions on rt and yt can ba written oo log yt = a ' + alog rt + . I Blog (6-+ r t ) wlt (4 3) ~ - where a ' log a and wit i a the random disturbance which can be rhovn t o be of order i n probability.- e/ In vhat follove, it w i l l be ureful to .nite the above vector and m t r h notations a s . L vherc Y1 is a Txl vedtor of T obeervatione on log yl, Xl is a '3 L T abservatione on t h e righthand aide variables of (4.3). wl i a * r vector od T obeervatibns on the disturbance term and 6 comJ~stingof =he three elements a', a and 0. Then the - 8 See Kakvani and Podder (61, vhich w i l l be refered to a s ?lethod I i n subsequect d i s c ~ s s i o n s . Following Kakwitni and Podder 161 it can be shown that 6 is a con- s i s t e n t estimator of 6 and its asymptotic vairance - covariance matrix is given by var(2) ( X i X irillX (X' X )-I where is the variance and covariance matrix of vl. However, the aemptotic more e f f i c i e n t estimator of 6 is which con a l s o be shown to be consistent and its asymptotic variance - covariance m ~ t r i xwould be This generalized least- squares m~+.hobwillbz referred to a s Method 11. 'a - The information on incone ranges is available for most income distributions - L * which can be effectively utilized to improveqhe precision of the estimates. I) Tc show this, we consider the equation (2.4) which for the Lorenz curve (2.7) con be wricten as Substituting the eetimates of fit, nt and u, t h e above equation becomes where w is the random error which can again be shown t o be of order T ~ / ~ , 2 t i n probability. Write (4.11) i n vector and mritrix notations a8 where Y2 is a column vector of T o b s e ~ a t i o n son t h e dependent v a r i a b l e i n t h e equation (4.11), X2 is a Tx3 matrix, t h e f i r s t column of which coneists of T observation on t h e explanatory v a r i a b l e s ( - r ) and -rt of the equation (4.111, and w2 is t h e vector of s t o c h a s t i c disturbances. The equations (4.4) and (4.12) can n w b e combined together ae where - w fa now the vector of 2T disturbances with zero mean and covariances'*%atrix The coefficient vector 6 can now be estimated from (4.13) by t h e d i r e c t leaet-squares method which w i l l be referred t o a s Hethod 111. How- ever, the aeymptotic more e f f i c i e n t estimator of 6 w i l l be with its asymptotic variance - covariance matrix - var (i*) (x' h1 ~ 1 - l (4.17) - 111e estimator $ w i l l be referred t o as Method N.- 9/ The above procedure of estimating t h e parameters by combining two s t o c h a s t i c equations has been e a r l i e r used by Theil [ I ] i n connection with t h e nixed estimation and by Zellner [12] i n connection with t h e aeeu4ngly unrelated regressions. It can be demonstrated that the est!mntors of t h e c o e f f i c i e n t vector 6 obtained from t h e combined equation would be more e f f i - cienc t l ~ n nfrom t h e i n d i v i d u a l equation (4.4). 5. SOME EMPIRICAL RESULTS l