Concatenating#

Concatenation allows for growing a DataFrame when new data becomes available or new feature columns need to be inserted into the table.

Importing libraries and packages#

1# Mathematical operations and data manipulation
2import pandas as pd

Set paths#

1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"

Loading dataset#

1dataset = pd.read_csv(f"{data_path}/cleaned_mpi_disagg_by_groups.csv")

Wrangling#

1dataset.head()
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
0 Bangladesh MICS 2019 Bengali 0.104060 0.102702 24.384759 39284.990511 42.117223 17.441109 ... 12.484664 8.274627 0.569494 2.308714 12.478603 8.562996 98.809242 161104.688057 163046.173 South Asia
1 Bangladesh MICS 2019 Other 0.104060 0.216783 45.868093 890.521140 47.262356 10.881517 ... 11.733451 10.198139 8.354676 8.593331 11.536150 10.738271 1.190756 1941.482818 163046.173 South Asia
2 Belize MICS 2015/2016 Creole 0.017109 0.003768 1.051818 0.940881 35.820526 52.086931 ... 1.126231 3.964365 1.126231 3.383591 6.162911 4.409921 22.916001 89.452839 390.351 Latin America and the Caribbean
3 Belize MICS 2015/2016 Garifuna 0.017109 0.003887 1.097083 0.224891 35.433114 85.184902 ... 2.963020 2.963020 0.000000 2.963020 2.963020 2.963020 5.251431 20.499014 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Maya 0.017109 0.078922 18.631953 8.557940 42.358151 37.911840 ... 11.931632 7.811719 2.319572 9.465594 11.165109 4.267081 11.766724 45.931523 390.351 Latin America and the Caribbean

5 rows × 26 columns

1dataset_1 = dataset[
2    [
3        "Ethnic/racial/caste group",
4        "Country",
5        "MPI: Value for the group",
6        "MPI: Value for the country",
7    ]
8].sample(n=4)
9dataset_1
Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country
182 Sénoufo/Minianka Mali 0.395106 0.376063
43 Karo/Zimé Chad 0.384718 0.517011
121 Mancanha Guinea-Bissau 0.160864 0.340689
154 Kyrgyz Kyrgyzstan 0.000949 0.001426
1dataset_2 = dataset[
2    [
3        "Ethnic/racial/caste group",
4        "Country",
5        "MPI: Value for the group",
6        "MPI: Value for the country",
7    ]
8].sample(n=4)
9dataset_2
Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country
88 Mandinka Gambia 0.162684 0.203638
127 Amerindian Guyana 0.047425 0.006592
244 Creole Sierra Leone 0.094735 0.292899
273 Other nationality Togo 0.136968 0.179616
1dataset_3 = dataset[
2    [
3        "Ethnic/racial/caste group",
4        "Country",
5        "MPI: Value for the group",
6        "MPI: Value for the country",
7    ]
8].sample(n=4)
9dataset_3
Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country
65 Mandé du Sud Cote d'Ivoire 0.264424 0.235871
194 Ekoi Nigeria 0.137174 0.254390
36 Baguirmi/Barma Chad 0.447317 0.517011
178 Other countries Mali 0.137870 0.376063
1# This works
2dataset_concat_1 = pd.concat([dataset_1, dataset_2, dataset_3], axis=0)
3dataset_concat_1
Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country
182 Sénoufo/Minianka Mali 0.395106 0.376063
43 Karo/Zimé Chad 0.384718 0.517011
121 Mancanha Guinea-Bissau 0.160864 0.340689
154 Kyrgyz Kyrgyzstan 0.000949 0.001426
88 Mandinka Gambia 0.162684 0.203638
127 Amerindian Guyana 0.047425 0.006592
244 Creole Sierra Leone 0.094735 0.292899
273 Other nationality Togo 0.136968 0.179616
65 Mandé du Sud Cote d'Ivoire 0.264424 0.235871
194 Ekoi Nigeria 0.137174 0.254390
36 Baguirmi/Barma Chad 0.447317 0.517011
178 Other countries Mali 0.137870 0.376063
1# This does not (not really)
2dataset_concat_1 = pd.concat([dataset_1, dataset_2, dataset_3], axis=1)
3dataset_concat_1
Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country Ethnic/racial/caste group Country MPI: Value for the group MPI: Value for the country
182 Sénoufo/Minianka Mali 0.395106 0.376063 NaN NaN NaN NaN NaN NaN NaN NaN
43 Karo/Zimé Chad 0.384718 0.517011 NaN NaN NaN NaN NaN NaN NaN NaN
121 Mancanha Guinea-Bissau 0.160864 0.340689 NaN NaN NaN NaN NaN NaN NaN NaN
154 Kyrgyz Kyrgyzstan 0.000949 0.001426 NaN NaN NaN NaN NaN NaN NaN NaN
88 NaN NaN NaN NaN Mandinka Gambia 0.162684 0.203638 NaN NaN NaN NaN
127 NaN NaN NaN NaN Amerindian Guyana 0.047425 0.006592 NaN NaN NaN NaN
244 NaN NaN NaN NaN Creole Sierra Leone 0.094735 0.292899 NaN NaN NaN NaN
273 NaN NaN NaN NaN Other nationality Togo 0.136968 0.179616 NaN NaN NaN NaN
65 NaN NaN NaN NaN NaN NaN NaN NaN Mandé du Sud Cote d'Ivoire 0.264424 0.235871
194 NaN NaN NaN NaN NaN NaN NaN NaN Ekoi Nigeria 0.137174 0.254390
36 NaN NaN NaN NaN NaN NaN NaN NaN Baguirmi/Barma Chad 0.447317 0.517011
178 NaN NaN NaN NaN NaN NaN NaN NaN Other countries Mali 0.137870 0.376063