Merging#

Merging combines multiple sources of data in one dataset, IF they have some common features/keys.

Importing libraries and packages#

1# Mathematical operations and data manipulation
2import pandas as pd

Set paths#

1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"

Loading dataset#

1dataset = pd.read_csv(f"{data_path}/cleaned_mpi_disagg_by_groups.csv")

Wrangling#

1dataset.head()
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
0 Bangladesh MICS 2019 Bengali 0.104060 0.102702 24.384759 39284.990511 42.117223 17.441109 ... 12.484664 8.274627 0.569494 2.308714 12.478603 8.562996 98.809242 161104.688057 163046.173 South Asia
1 Bangladesh MICS 2019 Other 0.104060 0.216783 45.868093 890.521140 47.262356 10.881517 ... 11.733451 10.198139 8.354676 8.593331 11.536150 10.738271 1.190756 1941.482818 163046.173 South Asia
2 Belize MICS 2015/2016 Creole 0.017109 0.003768 1.051818 0.940881 35.820526 52.086931 ... 1.126231 3.964365 1.126231 3.383591 6.162911 4.409921 22.916001 89.452839 390.351 Latin America and the Caribbean
3 Belize MICS 2015/2016 Garifuna 0.017109 0.003887 1.097083 0.224891 35.433114 85.184902 ... 2.963020 2.963020 0.000000 2.963020 2.963020 2.963020 5.251431 20.499014 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Maya 0.017109 0.078922 18.631953 8.557940 42.358151 37.911840 ... 11.931632 7.811719 2.319572 9.465594 11.165109 4.267081 11.766724 45.931523 390.351 Latin America and the Caribbean

5 rows × 26 columns

1dataset_1 = dataset[
2    ["Child mortality (%)", "Nutrition (%)", "Ethnic/racial/caste group"]
3][0:4]
4dataset_1
Child mortality (%) Nutrition (%) Ethnic/racial/caste group
0 2.115224 15.325885 Bengali
1 1.010234 9.871282 Other
2 14.073077 38.013855 Creole
3 38.147920 47.036982 Garifuna
1dataset_2 = dataset[
2    ["Ethnic/racial/caste group", "Sanitation (%)", "Drinking water (%)"]
3][0:4]
4dataset_2
Ethnic/racial/caste group Sanitation (%) Drinking water (%)
0 Bengali 8.274627 0.569494
1 Other 10.198139 8.354676
2 Creole 3.964365 1.126231
3 Garifuna 2.963020 0.000000
1pd.merge(dataset_1, dataset_2, on="Ethnic/racial/caste group", how="inner")
Child mortality (%) Nutrition (%) Ethnic/racial/caste group Sanitation (%) Drinking water (%)
0 2.115224 15.325885 Bengali 8.274627 0.569494
1 1.010234 9.871282 Other 10.198139 8.354676
2 14.073077 38.013855 Creole 3.964365 1.126231
3 38.147920 47.036982 Garifuna 2.963020 0.000000
1pd.merge(
2    dataset_1, dataset_2, on="Ethnic/racial/caste group", how="inner"
3).drop_duplicates()
Child mortality (%) Nutrition (%) Ethnic/racial/caste group Sanitation (%) Drinking water (%)
0 2.115224 15.325885 Bengali 8.274627 0.569494
1 1.010234 9.871282 Other 10.198139 8.354676
2 14.073077 38.013855 Creole 3.964365 1.126231
3 38.147920 47.036982 Garifuna 2.963020 0.000000
1dataset_3 = dataset[
2    ["Ethnic/racial/caste group", "Health (%)", "Intensity of deprivation (%)"]
3][2:6]
4dataset_3
Ethnic/racial/caste group Health (%) Intensity of deprivation (%)
2 Creole 52.086931 35.820526
3 Garifuna 85.184902 35.433114
4 Maya 37.911840 42.358151
5 Mestizo/Spanish/Latino 34.317536 36.699757
1pd.merge(
2    dataset_1, dataset_3, on="Ethnic/racial/caste group", how="inner"
3).drop_duplicates()
Child mortality (%) Nutrition (%) Ethnic/racial/caste group Health (%) Intensity of deprivation (%)
0 14.073077 38.013855 Creole 52.086931 35.820526
1 38.147920 47.036982 Garifuna 85.184902 35.433114
1pd.merge(
2    dataset_1, dataset_3, on="Ethnic/racial/caste group", how="outer"
3).drop_duplicates()
Child mortality (%) Nutrition (%) Ethnic/racial/caste group Health (%) Intensity of deprivation (%)
0 2.115224 15.325885 Bengali NaN NaN
1 1.010234 9.871282 Other NaN NaN
2 14.073077 38.013855 Creole 52.086931 35.820526
3 38.147920 47.036982 Garifuna 85.184902 35.433114
4 NaN NaN Maya 37.911840 42.358151
5 NaN NaN Mestizo/Spanish/Latino 34.317536 36.699757