Working Paper
Inequality and institutional outcomes in Viet Nam

A combined principal components and clustering analysis

Better understanding of inequality, including its relationship to governance and other key outcomes, is relevant both to academic researchers and to policy-makers. Nevertheless, efforts to establish causal relationships empirically remain hampered by the quality and availability of data, especially for Global South countries at the sub-national level. 

This paper draws on newly available data on income inequality in Viet Nam at the provincial level to show how unsupervised learning techniques might be used as tools in consideration of the relationship between inequality and governance. While previous empirical work in this area has largely used standard techniques such as regression analysis aimed at establishing causal relationships, this is often hampered by the quality and availability of data. 

Adopting a different approach, this paper applies K-means clustering and principal components analysis (PCA) to show how unsupervised learning techniques can provide relevant insight into structures and patterns in data. Using PCA, it identifies two groupings of provinces based on similarities in institutional quality measures. K-means analysis points to similar relative inequality levels but substantially different absolute inequality and income levels, suggesting two broad ‘types’ of provinces. 

The results are suggestive of the positive impact of initial inequality on institutions and that better quality of institutions might reduce inequality for some groups. In general, increased incomes might imply improved inequality and institutional quality outcomes in some cases. A final section considers key limits to such analysis, alongside extensions and further applications.