A ‘data revolution’ for sustainable development leaves gaps on inequality

Among the many things said about the Sustainable Development Goals (SDGs) is the description by the President of the UN General Assembly’s 70th session, Mogens Lykketoft, that the SDGs represented ‘an unprecedented statistical challenge’. In addition to the 17 goals, there are 169 targets and 232 accompanying indicators for statistical agencies and specialists to fret over.

The UN Report which lays out a strategy for a data revolution to accompany the SDGs, A World that Counts, says that:

‘Data are the lifeblood of decision-making and the raw material for accountability. Without high-quality data providing the right information on the right things at the right time; designing, monitoring and evaluating effective policies becomes almost impossible.’

Statistics are so important to public policy that the World Bank even keeps data on how well each country keeps data.

Major improvements are needed to improve inclusion

In recent years and in line with UN goals, data on poverty, economic inequality, and a host of other important issues have improved tremendously. The promise of these improvements should not be understated. But at the same time, we should be realistic about the ability of quantitative data to deliver on SDG 10: reduced inequalities.

There are key methodological, conceptual, and—in particular—political issues which pose persistent challenges for studying and monitoring inequalities between groups of people. These challenges are rooted in potentially contentious questions about which data to collect in survey and census data. They imply not only real limits on using the available data to address social and economic exclusion, but also serious risks to ‘evidence-based’ policy making, especially if decision-makers rely too heavily on results derived solely from analyses of quantitative data.

Given the limitations of the existent data, some of which I outline below, a deep responsibility falls to knowledge producers and knowledge communicators in and around the UN system and other policy-making spheres not to overstate the ability of quantitative empirics to describe the experiences of inequalities within and between groups in most countries of the world.

Limitations of data for addressing group-based inequalities

Nate Silver—the data scientist who founded 538 and made a name for himself using big data to predict the outcomes of baseball players’ rookie seasons and US elections—has examined our ability to use data to make predictions in a number of professions. Weather predictions, he argues, have become very good. Economic predictions, not so much.

With weather data, we know what to collect and why. With data about human societies, what influences outcomes is not nearly as simple. When it comes to social studies, what data to collect, on whom, and why is neither obvious, nor apolitical.

When our core research questions are about the inclusion and access of excluded groups, what happens when the existent data itself excludes these groups?

The measurement of inequality in general is constrained by issues of data availability, data quality, non-standardization of data, and differences in definitions, classifications, and methodologies. Beyond these standard constraints the production of horizontal inequality indicators—which I rely on in my work—is made difficult by poor data on ethnic groups, which is often unavailable, or significantly incomplete, or entirely problematic for many countries.

Data are simply not available

In a recent project, my colleagues and I examined data for 15 developing countries, including many of the largest. Our country-focused studies show clearly that, even with very focused analysis, it is not uncommon for significant gaps in our understanding to remain because the information on politically-relevant ascriptive groups that is needed for the production of valid and reliable quantitative measures of horizontal inequality is simply unavailable in survey and census data. For instance, recent official statistics are insufficient on ethnicity and religion in Tanzania, religious sect—for example, Sunni or Shi’a—in Iran, and caste in India.

In many countries, there are political reasons for these omissions. Some countries are quite explicit about them. In Rwanda after the genocide, for instance, collecting information on ethnicity is considered nationally divisive. The official line is: ‘There is no ethnicity here. We are all Rwandan.’

Data are significantly incomplete

Even when data are available, major hurdles still exist in achieving robust results on smaller minority groups with quantitative data, namely small sample size. To study most questions about society, survey data are drawn from a representative sample of the population. But, in surveys of this kind, the total number of individuals from minority groups represented may be too small to make statistically significant conclusions.

For instance, Vietnam officially has 54 ethnic groups, 53 of which each comprise less than 2% of the population. Many analyses, like this one, thus consider ethnic divisions in terms of two groups, Kinh and ‘other’ (or ‘non-Kinh’). Treating multiple minority groups as a single group is often the only practical option. But such aggregate categories do obscure the diversity that may exist inside them. Both the data collected—and the treatment the data receives—can hide this diversity rather than illuminate it for decision-makers.

Conceptual challenges

A further set of issues stem from the conceptual challenge of capturing identities in statistics, many of which have ethical or political dimensions. How do we decide which groups have political, social, and economic salience? Who gets to decide? Why does the census or a survey enumerate some groups and not others, and why does this change? For instance, the Philippine census in 2000 identified 147 ethnolinguistic groups and 93 religions, while in 2010, it considered 182 and 97 respectively. In studying ethnic inequality in the Philippines, should we consider inequality among these 147 (or 182) groups? Or, is it more appropriate—as one team of researchers did—to aggregate these groups into three broader ‘politically-salient’ ethno-religious groups—Muslims, indigenous persons, and everyone else—thus reclassifying the categories listed in the census?

Official data from South Africa is another illustration of the potentially contentious and political nature of these decisions. The national statistics office reports data on five racial categories: Black, White, Coloured, Indian, and Other. Not only do these categories aggregate multiple racial and ethnolinguistic identities and obscure diversity and potential inequality within the category, they also maintain the racial coding developed by the apartheid state. 

On the one hand, this is a system of racial categorization that many South Africans find problematic. On the other hand, there is a practical reason for continuing to collect these data so that South Africa can monitor progress on reducing the very racial inequalities caused by the apartheid state.

As the quote goes, ‘Not everything that counts can be counted, and not everything that can be counted counts’. Notwithstanding the significant progress that has been made in the ‘data revolution’ for sustainable development, continued investment in more and better statistics will only get us so far in understanding and monitoring inequality. Especially when we consider inequality between groups, there are inherent limits and biases in what can be learned from the numbers alone.


The views expressed in this piece are those of the author(s), and do not necessarily reflect the views of the Institute or the United Nations University, nor the programme/project donors.