Research & Policy Briefs From the World Bank Malaysia Hub No. 7, May 2017 Open Data: Differences and Implications across Countries Lay Lian Chuah and Norman V. Loayza The pros and cons of making data more accessible to the public have been widely debated. Proponents of open data argue that it is good practice for governments because it fosters transparency, promotes greater participation, and encourages sharing of ideas, which is important in building a research-oriented culture. Others, however, are less convinced of the merits of open data. This research policy brief finds that there is a relationship between accessibility of data and income levels of a country, and between data availability and the productivity and quality of economic research. The Merits of Open Data complete, easy to interpret, and timely data to solve problems or make decisions. Advocates of open access to public data argue that open data nurtures research; spurs the sharing of ideas; and helps While in many countries the national statistical agencies may individuals, firms, and policy makers make informed decisions not be the only custodians of key public data, the national that lead to improved outcomes. While the benefits of open data government is nonetheless responsible for ensuring the open seem compelling, trust and privacy issues, as well as lack of publication of such data. Some data-collecting agencies consider expertise, resources, and technological capabilities, continue to data protection to be important in maintaining the trust of the act as barriers to open data practices. establishments they survey and in eliciting truthful responses from them. However, data that are inaccessible represent a While commitment to open data initiatives and efforts made locked resource from which value cannot be fully extracted. By to achieve them vary widely around the world, many countries unlocking data, the government can leverage on the creative and are making great strides in making data as accessible as possible rigorous policy recommendations from the research community (Neubauer 2013). However, other countries are less convinced of for its policy analysis and planning. the merits of providing complete access to data. This brief discusses and provides empirical evidence on two key questions How accessible are data in different countries? related to the merits of open data: Is there a relationship between accessibility of data and income levels of a country, and is there a ODB and OKI assess the state of open data initiatives globally. The relationship between data availability and research productivity two sets of rankings differ in their methodology and coverage. ODB and quality. covers 92 countries and 15 types of datasets. It computes a country’s ranking based on three dimensions: a country’s readiness Two international assessments of data openness around the to support and respond to the positive outcomes from open data world—Open Data Barometer (ODB) and Open Knowledge initiatives; the implementation of open data practices; and the International (OKI)—define open data as public information that impact of open data on governments, societies, and the economies. can be “freely used and shared by anyone for any purpose.” Data ODB also combines experts’ opinions, technical assessments of data quality is equally important. Together, the availability, accessibility, supply, and secondary data for the construction of its open data and quality of data determine the usefulness and usability of data. rankings. OKI covers 122 economies, 13 types of datasets, and Users will find significant value add in having accurate, measures openness based solely on data accessibility. Figure 1. Data Accessibility across Countries by Type of Data and Income Group No. of countries 35 Low-Income Countries (LIC) 30 Low-Middle-Income Countries (LMIC) 25 Upper-Middle-Income Countries (UMIC) High-Income Countries (HIC) 20 15 10 5 0 D13 - Crime D14 - Environment D1 - Map D7 - Company D10 - Trade D11 - Health D15 - Elections D16 - Contracts D2 - Land D4 - Census D5 - Budget D6 - Spending D8 - Legislation D9 - Transport D12 - Education Datasets Assessed by Open Data Barometer (D1-D16) Source: Open Data Barometer (ODB), 2015. Affiliation: Development Research Group, the World Bank. Acknowledgement: Kenneth Simler and Nancy Morrison contributed to this brief with insightful comments and suggestions. Objective and disclaimer: Research & Policy Briefs synthesize existing research and data to shed light on a useful and interesting question for policy debate. Research & Policy Briefs carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions are entirely those of the authors. They do not necessarily represent the views of the World Bank Group, its Executive Directors, or the governments they represent. Global Knowledge & Research Hub in Malaysia Open Data: Differences and Implications across Countries Figure 2. Openness Gap in Data between Upper-Middle-Income Countries and High-Income Countries Ranking 90 80 70 Deteriorating performance 60 Average rank for upper-middle-income countries 51 50 40 Openness gap 30 Average rank for high-income countries, excluding 20 the Middle East 10 0 HUN SAU MEX MKD RUS ECU PER GBR USA FRA CAN DNK NZL NLD SWE AUS FIN DEU ESP AUT JPN NOR URY CHE ITA ISL BEL SGP EST CZE IRL ISR CHL PRT POL GRC SVK ARE BHR QAT BRA COL CRI TUR ZAF MYS ARG MUS JAM CHN PRY JOR NAM BWA VEN KOR KAZ GEO THA LCA High-income countries Upper-middle-income countries Source: Open Data Barometer, 2015 Note: The openness gap is the difference between the average rank for upper-middle-income countries and high-income countries. Dark blue bars are high-income countries in the Middle East; sky blue bars are high-income countries in Europe and North America; light green bars are high-income countries in the Asia Pacific; and dark green bars are high-income countries in Latin America. Orange bars are upper-middle-income countries in Latin America. Pale orange bars are upper-middle-income countries in Central Asia, Africa, and the Caribbean. Red bars are upper-middle-income countries in Asia. According to ODB’s 2015 assessment, 55 percent of the rankings which shows Malaysia and China falling outside the countries surveyed have an open data initiative in place; however, upper-middle-income band. The openness gap between the only 10 percent of government data is freely accessible. Twenty- upper-middle-income and high-income countries in the region is six of the top 30 countries in the ranking are high-income 44 notches—wider than the corresponding gap in non-Asian countries. Half the open datasets are found in just the top 10 countries. The disparity in open data initiatives for countries in member-countries of the Organisation of Economic Cooperation Asia is higher than for countries outside this region. and Development (OECD), while almost none are in Africa. The rankings reveal a large openness gap between high- and Is there a relationship between accessibility of data and low-income countries. Similar patterns also emerge in the OKI income levels? rankings for 2015. High-income countries occupy 23 out of the top 30 places. Taiwan, China tops the 2015 ranking and became The positive relationship between data access and income levels the first non-European economy to be placed in the top three. suggests that greater access to data allows people to be more informed and efficient in solving problems, which leads to better Data accessibility can be assessed at three levels: (1) public outcomes. However, it could also be the case that the causality data do not exist; (2) public data do exist but are somehow runs the other direction: that is, higher-income countries have the inaccessible; and (3) public data exist and are accessible. resources to invest in open data initiatives. According to ODB, only two of the 92 countries in their rankings—the United Kingdom and Japan—make all of their The correlation between a country’s ranking in data openness existing 15 types of datasets available to the public. Slightly more and GDP per capita is -0.75 (figure 3). Using the averages of the than half the countries have at least one or two types of datasets upper-middle-income rankings and GDP per capita as bench- that exist but are not available to the public (figure 1). Of the marks suggests that countries such as Malaysia, Thailand, and 1,380 datasets for 92 countries surveyed, only 18.7 percent (256) Venezuela (right upper quadrant) have underperformed with of datasets that exist are not accessible to the public; they mainly respect to countries in the similar GDP band. Among the upper- cover government spending and land data. middle-income countries, Ecuador and South Africa are slight outperformers. The first 32 spots in the ranking are mainly filled by high- income countries, excluding countries of the Middle East. The Is there a relationship between data availability and average ranking for high-income countries is 18 (figure 2). The research productivity and quality? upper-middle-income countries tend to occupy the 16th to 78th spots. However, Latin American countries such as Mexico, Brazil, The field of economics has evolved over the past several decades and Colombia (in the top 30) perform more favorably than other toward greater emphasis on empirical work. While economic countries at similar income levels. The average ranking for upper- theory provides a conceptual framework, better data facilitate middle-income countries is 51. Thus there is an “openness gap” of more rigorous testing of theories and assessment of their 33 notches between the upper-middle-income countries and the relevance (Einav and Levin 2014, Jin 2009). In a growing number high-income countries. ODB attributes this gap mainly to the of cases, more granular data are needed (McGuckin 1993); that is lower scores for the sustainable publication of data, disaggregated data coming from household, labor, and firm discoverability of data, and links to key datasets. surveys and censuses, for instance. In Asia the majority of the upper-middle-income countries are Until the mid-1980s, the majority of papers published in the concentrated in the 40th to 80th spots, according to the OKI 2015 top three economic journals—American Economic Review, 2 Research & Policy Brief No.7 Figure 3: Data Accessibility and Income per Capita Ranking Average income per capita for upper-middle-income countries 100 HTI 90 MMR ZWE MLI YEM Underperformers SL E ZMB PAK 80 MWI ETH CMR BGD NAM VEN BWA MOZ BEN EGY 70 UGA BFA SEN TZA NPL GHA JOR NGA LCA UKR 60 Average rank for upper- MAR PRY GEO THA QAT VNM BHRSAU middle-income countries JAM ARG CHN MUS KAZ HUN 50 MYS RWA ZAF TUR ARE PER CRI KEN ECU 40 IND IDN TUN RUS PHL SVK MDA MKD GRC POL 30 COL CHL PRT ISR CZE IRL EST SGP SL BIE L 20 URY ITA CHE BRA MEX NOR Outperformers ESPJPN AUT 10 KOR DEU FIN AU S SWE NZL DNK NLD CAN FRA U SA 0 GBR 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 Log GDP per capita Source: Open Data Barometer, 2015, and the World Development Indicators (WDI) database. Note: Log GDP per capita is in terms of purchasing power parity (PPP) in constant 2011 international dollars. Journal of Political Economy, and Quarterly Journal of between 1995 and 2015 are summed and divided by the average Economics—were theoretical, according to a review of publication population during the same period to yield a citation ratio. patterns from 1963 to 2011 (Hamermesh 2013). However, the Because the ratios are very small, they are multiplied by 10 million share of empirical papers in top journals has climbed to more than before converting into logs (figure 4). 70 percent in 2011. These empirical papers use data that have As shown in panel a, there is a positive correlation of 0.46 been assembled by public agencies, obtained directly by the between the OKI’s open data scores and log publication per capita authors, or generated through controlled experiments. of the Asia Pacific economies. High-income countries (shown by Importantly, granular data is being used to pose new questions. As red dots) with higher open data scores have a higher publication a result, it enables new research designs that can offer insights in per capita (1.9 average, compared to the overall average of 0.8). the consequences of different economic policies and events. The association between the citation ratio and open data is We consider the link between countries’ quality of research positive (+0.51). This is consistent with the idea that data and the open data ranking compiled by OKI for countries in the Asia openness is needed to produce quality research (panel b). High- Pacific region. For a proxy for quality, the analysis uses the number income countries tend to produce higher quality research. For of publications and citations of journal articles published in the top every article published by high-income countries, an average of 10 high-impact economic journals in the past 10 years by countries 77.3 citations are registered, compared to 46.1 citations for in the sample. This approach is adapted from Aizenman et al upper-middle-income countries; 66.1 citations for lower-middle- (2011). The high-impact journal articles are selected based on their income countries; and 34.3 citations for the only low-income ranking by a database dedicated to economic research, IDEAS, country in the sample, Nepal. The results suggest that quality which computes citation counts and various measures of a publications may be associated with open data and in turn appear publication’s impact. The articles published by the top 10 journals to be associated with higher income levels. Figure 4. Data Openness is Positively Correlated with Publication per Capita and Citation Ratio a. Open data score and log publication per capita b. Open data score and citation ratio Log publication per capita Citation Ratio 4.0 120.0 Average SGP HKG 100.0 3.0 SGP TWN IDN 80.0 KOR 2.0 JPN IND JPN AUS 60.0 PHL TWN MYS PAK HKG 1.0 KOR Average AUS 40.0 CHN MYS THA THA PHL NPL NPL 0.0 IDN IND CHN 20.0 PAK -1.0 0.0 0 20 40 60 80 100 0 20 40 60 80 100 Open Data Score Open Data Score Source: Open Knowledge International (OKI) and IDEAS. Note: Higher open data scores result in better rankings. TWN = Taiwan, China. Red dots = high-income countries. Blue dots = upper middle, low-middle and low-income countries. 3 Open Data: Differences and Implications across Countries Box 1. How Data Users Perceive the Accessibility and access online. Only 30% of the respondents found it to be Quality of Public Data in Malaysia difficult. Approximately half of the respondents also consider the quality and format to be average. A survey conducted in September 2016 for this Brief set out to answer questions on the accessibility and quality of public data. On the other hand, most respondents (89.5 percent) reported The three-week survey sampled 831 respondents of whom about that the data were not adequate in terms of granularity needed for one-quarter (28 percent) responded. Nearly half the respondents rigorous economic research (figure B1.1a). Granularity corresponds (46.1 percent) are in the government or public administrative to disaggregated data, at the individual, household, worker, or firm sector, while 18.3 percent work in education, 16.5 percent work in levels. Of those who consider that access to sufficiently finance, and 11.7 percent work in nonprofit organizations, disaggregated data to be inadequate, 68 percent work in including think tanks. By profession, 39.8 percent are either professions that use data intensively, including academicians, analysts or consultants, 28.1 percent are in management researchers, analysts, and consultants. Among the academicians positions, and 24.2 percent are academicians or researchers. and researchers who consider data to be inadequate, 61.3 percent also consider public data not easily accessible (figure B1.1b). About Almost 60 percent of the respondents have at least 11 years 61.3 percent of the respondents who indicated that the data were of work experience. An overwhelming majority (95.7 percent) are not granular enough also found all data to be of average quality. comfortable using computers. The majority of the respondents More than 75 percent of the respondents agree or strongly agree (70 percent) have found publicly available data relatively easy to that availability of data contributes to research capacity in Malaysia. Figure B1.1 Access to Granular Data (e.g. household and firm-level data) in Malaysia is a Constraint to Research a. Adequacy of granularity of data b. Degree of accessibility of online public data, for those (percent of all respondents) respondents who consider data granularity inadequate percent 100.0 90.0 20.0 10.4 80.0 38.1 44.8 70.0 61.3 60.0 46.5 50.0 60.0 40.0 57.1 41.4 43.0 30.0 20.0 35.5 10.0 13.8 20.0 3.2 4.8 0.0 Academia/Researcher Analyst/Consultant Management Other/Student Adequate Somewhat adequate Inadequate Easy Average Not Easy Source: The World Bank. Conclusion Public data are an asset. Making them available, usable, and Data are vital for research. The analysis shows that there is a discoverable—that is, “open”—promotes efficiencies, increases positive association between research productivity and data transparency, creates economic opportunities, and increases accessibility. The association is also positive between quality of people’s participation in the contribution of ideas research and data accessibility. (www.data.gov). However, some countries, concerned about confidentiality and data abuse, remain unconvinced about the The movement towards open data is gaining traction. benefits of making data more publicly accessible. These diverging Countries need to be prepared to devote resources to strengthen opinions are reflected in the uneven progress in making data data management and programming capacities, plan for accessible to the public. Based on the rankings provided by ODB unintended consequences, and engage communities to harness and OKI, high-income countries appear to have made better the potential of open data. A challenge is to develop methods for progress in terms of readiness to adopt and implement open data researchers to access data in ways that respect privacy and practices. confidentiality. References Aizenman, J., H. Edison, L. Leony, and Y. Sun. 2011. “Evaluating the Quality of IMF McGuckin, R. H. 1993. “The Importance of Establishment Data in Economic Research: A Citation Study.” Background Paper BP/11/01, Independent Research.” Working Paper 93-10, Center of Economic Studies, U.S. Census Evaluation Office, International Monetary Fund, Washington, DC. Bureau, Washington, DC. Einav, L., and J. Levin. 2014. “Economics in the Age of Big Data.” Science 346 Neubauer, M. 2013. “New Report Highlights Successes and Challenges of Worldwide Open Data Policies.” Techpresident.com, November 1. http:// (6210): 715–821. techpresident.com/news/wegov/24480/new-report-highlights- Hamermesh, D. S. 2013. “Six Decades of Top Economics Publishing: Who and successes-and-challenges-worldwide-open-data-policies. How?” Journal of Economics Literature 51 (1): 162–72. Open Knowledge International. Global Open Data Index. http://2015.index. Jin, J. C. 2009. “Economic Research and Economic Growth: Evidence of East Asian okfn.org/. Countries.” Journal of Asian Economics 20 (2): 150–55. Open Data Barometer. http://opendatabarometer.org/barometer/. 4 Global Knowledge & Research Hub in Malaysia