Dropping Missing Values with dropna#

Importing libraries and packages#

1# Mathematical operations and data manipulation
2import pandas as pd

Set paths#

1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"

Loading dataset#

1dataset = pd.read_csv(f"{data_path}/changed_columns_mpi_disagg_by_groups.csv")

Wrangling#

1dataset.head()
Country Country.1 Type of survey Type of survey.1 Survey year Survey year.1 Ethnic/racial/caste group Ethnic/racial/caste group.1 MPI: Value for the country MPI: Value for the country.1 ... Electricity (%) Electricity Housing (%) Housing Assets (%) a.5 Population share by group (%) Population size by group (thousands) Population size (thousands) Region
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... (%) NaN (%) NaN (%) NaN (%) (thousands) (thousands) NaN
1 Bangladesh NaN MICS NaN 2019 NaN Bengali NaN 0.104060 NaN ... 2.3087142035365105 NaN 12.478602677583694 NaN 8.562996238470078 NaN 98.80924224853516 161104.68805653573 163046.173 South Asia
2 Bangladesh NaN MICS NaN 2019 NaN Other NaN 0.104060 NaN ... 8.593330532312393 NaN 11.536150425672531 NaN 10.738271474838257 NaN 1.1907564476132393 1941.4828175841367 163046.173 South Asia
3 Belize NaN MICS NaN 2015/2016 NaN Creole NaN 0.017109 NaN ... 3.3835913985967636 NaN 6.162910908460617 NaN 4.409920796751976 NaN 22.916001081466675 89.45283938151599 390.351 Latin America and the Caribbean
4 Belize NaN MICS NaN 2015/2016 NaN Garifuna NaN 0.017109 NaN ... 2.963019721210003 NaN 2.963019721210003 NaN 2.963019721210003 NaN 5.251431092619896 20.49901378435269 390.351 Latin America and the Caribbean

5 rows × 48 columns

1dataset.shape
(322, 48)
1# Dropping the empty columns
2dataset.dropna(how="all", axis=1, inplace=True)
3dataset
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the country.1 MPI: Value for the group a Headcount (%) a.1 ... Electricity (%) Electricity Housing (%) Housing Assets (%) a.5 Population share by group (%) Population size by group (thousands) Population size (thousands) Region
0 NaN NaN NaN NaN NaN NaN NaN NaN (%) NaN ... (%) NaN (%) NaN (%) NaN (%) (thousands) (thousands) NaN
1 Bangladesh MICS 2019 Bengali 0.104060 NaN 0.102702 NaN 24.3847593665123 NaN ... 2.3087142035365105 NaN 12.478602677583694 NaN 8.562996238470078 NaN 98.80924224853516 161104.68805653573 163046.173 South Asia
2 Bangladesh MICS 2019 Other 0.104060 NaN 0.216783 NaN 45.86809277534485 NaN ... 8.593330532312393 NaN 11.536150425672531 NaN 10.738271474838257 NaN 1.1907564476132393 1941.4828175841367 163046.173 South Asia
3 Belize MICS 2015/2016 Creole 0.017109 NaN 0.003768 NaN 1.0518179275095463 NaN ... 3.3835913985967636 NaN 6.162910908460617 NaN 4.409920796751976 NaN 22.916001081466675 89.45283938151599 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Garifuna 0.017109 NaN 0.003887 NaN 1.0970834642648697 NaN ... 2.963019721210003 NaN 2.963019721210003 NaN 2.963019721210003 NaN 5.251431092619896 20.49901378435269 390.351 Latin America and the Caribbean
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
317 Column 3: Refers to the self-identified ethnic... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
318 Columns 4-21: HDRO and OPHI calculations based... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
319 Columns 22 and 23: HDRO and OPHI calculations ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
320 Column 24: United Nations Department of Econom... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
321 Column 25: UNDP classification of developing r... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

322 rows × 44 columns

 1# Drop multiple columns by name
 2dataset.drop(
 3    [
 4        "MPI: Value for the country.1",
 5        "a",
 6        "a.1",
 7        "a.2",
 8        "a.3",
 9        "Health",
10        "Education",
11        "a.4",
12        "Nutrition",
13        "Child mortality",
14        "Years of schooling",
15        "School attendance",
16        "Cooking fuel",
17        "Sanitation",
18        "Drinking water",
19        "Electricity",
20        "Housing",
21        "a.5",
22    ],
23    axis=1,
24    inplace=True,
25)
26dataset
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
0 NaN NaN NaN NaN NaN NaN (%) (thousands) (%) (%) ... (%) (%) (%) (%) (%) (%) (%) (thousands) (thousands) NaN
1 Bangladesh MICS 2019 Bengali 0.104060 0.102702 24.3847593665123 39284.990510756514 42.117223143577576 17.44110882282257 ... 12.484664469957352 8.274626731872559 0.5694943480193615 2.3087142035365105 12.478602677583694 8.562996238470078 98.80924224853516 161104.68805653573 163046.173 South Asia
2 Bangladesh MICS 2019 Other 0.104060 0.216783 45.86809277534485 890.521139986871 47.26235568523407 10.88151652365923 ... 11.73345148563385 10.198139399290085 8.354675769805908 8.593330532312393 11.536150425672531 10.738271474838257 1.1907564476132393 1941.4828175841367 163046.173 South Asia
3 Belize MICS 2015/2016 Creole 0.017109 0.003768 1.0518179275095463 0.9408810012811046 35.820525884628296 52.086931467056274 ... 1.1262305080890656 3.9643649011850357 1.1262305080890656 3.3835913985967636 6.162910908460617 4.409920796751976 22.916001081466675 89.45283938151599 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Garifuna 0.017109 0.003887 1.0970834642648697 0.22489129056550966 35.43311357498169 85.18490195274353 ... 2.963019721210003 2.963019721210003 0 2.963019721210003 2.963019721210003 2.963019721210003 5.251431092619896 20.49901378435269 390.351 Latin America and the Caribbean
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
317 Column 3: Refers to the self-identified ethnic... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
318 Columns 4-21: HDRO and OPHI calculations based... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
319 Columns 22 and 23: HDRO and OPHI calculations ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
320 Column 24: United Nations Department of Econom... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
321 Column 25: UNDP classification of developing r... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

322 rows × 26 columns

1# Drop the first row with units (are now included in column headers)
2dataset.drop([0], axis=0, inplace=True)
3dataset
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
1 Bangladesh MICS 2019 Bengali 0.104060 0.102702 24.3847593665123 39284.990510756514 42.117223143577576 17.44110882282257 ... 12.484664469957352 8.274626731872559 0.5694943480193615 2.3087142035365105 12.478602677583694 8.562996238470078 98.80924224853516 161104.68805653573 163046.173 South Asia
2 Bangladesh MICS 2019 Other 0.104060 0.216783 45.86809277534485 890.521139986871 47.26235568523407 10.88151652365923 ... 11.73345148563385 10.198139399290085 8.354675769805908 8.593330532312393 11.536150425672531 10.738271474838257 1.1907564476132393 1941.4828175841367 163046.173 South Asia
3 Belize MICS 2015/2016 Creole 0.017109 0.003768 1.0518179275095463 0.9408810012811046 35.820525884628296 52.086931467056274 ... 1.1262305080890656 3.9643649011850357 1.1262305080890656 3.3835913985967636 6.162910908460617 4.409920796751976 22.916001081466675 89.45283938151599 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Garifuna 0.017109 0.003887 1.0970834642648697 0.22489129056550966 35.43311357498169 85.18490195274353 ... 2.963019721210003 2.963019721210003 0 2.963019721210003 2.963019721210003 2.963019721210003 5.251431092619896 20.49901378435269 390.351 Latin America and the Caribbean
5 Belize MICS 2015/2016 Maya 0.017109 0.078922 18.63195300102234 8.557939781538732 42.35815107822418 37.91183978319168 ... 11.931631714105606 7.811719179153442 2.3195721209049225 9.465593844652176 11.165109276771545 4.2670805007219315 11.766723543405533 45.93152301891893 390.351 Latin America and the Caribbean
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
317 Column 3: Refers to the self-identified ethnic... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
318 Columns 4-21: HDRO and OPHI calculations based... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
319 Columns 22 and 23: HDRO and OPHI calculations ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
320 Column 24: United Nations Department of Econom... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
321 Column 25: UNDP classification of developing r... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

321 rows × 26 columns

1dataset.isnull().sum()
Country                                                           3
Type of survey                                                   25
Survey year                                                      25
Ethnic/racial/caste group                                        25
MPI: Value for the country                                       25
MPI: Value for the group                                         25
Headcount (%)                                                    25
Number of multidimensionally poor people by group (thousands)    25
Intensity of deprivation (%)                                     25
Health (%)                                                       25
Education (%)                                                    25
Standard of living (%)                                           25
Nutrition (%)                                                    25
Child mortality (%)                                              25
Years of schooling (%)                                           25
School attendance (%)                                            25
Cooking fuel (%)                                                 25
Sanitation (%)                                                   25
Drinking water (%)                                               25
Electricity (%)                                                  25
 Housing (%)                                                     25
Assets (%)                                                       25
Population share by group (%)                                    25
Population size by group (thousands)                             25
Population size (thousands)                                      26
Region                                                           25
dtype: int64
1# Drop the last 25 rows with notes
2n = 25
3dataset.drop(dataset.tail(25).index, inplace=True)
4dataset
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
1 Bangladesh MICS 2019 Bengali 0.104060 0.102702 24.3847593665123 39284.990510756514 42.117223143577576 17.44110882282257 ... 12.484664469957352 8.274626731872559 0.5694943480193615 2.3087142035365105 12.478602677583694 8.562996238470078 98.80924224853516 161104.68805653573 163046.173 South Asia
2 Bangladesh MICS 2019 Other 0.104060 0.216783 45.86809277534485 890.521139986871 47.26235568523407 10.88151652365923 ... 11.73345148563385 10.198139399290085 8.354675769805908 8.593330532312393 11.536150425672531 10.738271474838257 1.1907564476132393 1941.4828175841367 163046.173 South Asia
3 Belize MICS 2015/2016 Creole 0.017109 0.003768 1.0518179275095463 0.9408810012811046 35.820525884628296 52.086931467056274 ... 1.1262305080890656 3.9643649011850357 1.1262305080890656 3.3835913985967636 6.162910908460617 4.409920796751976 22.916001081466675 89.45283938151599 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Garifuna 0.017109 0.003887 1.0970834642648697 0.22489129056550966 35.43311357498169 85.18490195274353 ... 2.963019721210003 2.963019721210003 0 2.963019721210003 2.963019721210003 2.963019721210003 5.251431092619896 20.49901378435269 390.351 Latin America and the Caribbean
5 Belize MICS 2015/2016 Maya 0.017109 0.078922 18.63195300102234 8.557939781538732 42.35815107822418 37.91183978319168 ... 11.931631714105606 7.811719179153442 2.3195721209049225 9.465593844652176 11.165109276771545 4.2670805007219315 11.766723543405533 45.93152301891893 390.351 Latin America and the Caribbean
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
292 Uganda DHS 2016 Lango 0.281028 0.331589 67.10416078567505 1765.8691906057863 49.414002895355225 22.405487298965454 ... 11.113078892230988 10.360502451658249 8.658754825592041 10.666937381029129 9.859275072813034 4.350616410374641 5.94433955848217 2631.53457241768 44269.587 Sub-Saharan Africa
293 Uganda DHS 2016 Lugbara 0.281028 0.380233 71.11892700195312 809.2593583809123 53.46432328224182 21.582941338419914 ... 10.391145944595337 10.167766362428665 7.661116868257523 8.479014784097672 9.50494185090065 5.541577190160751 2.5703784078359604 1137.8959054861552 44269.587 Sub-Saharan Africa
294 Uganda DHS 2016 Other 0.281028 0.348234 66.4842426776886 5747.428972023331 52.37846374511719 24.311363324522972 ... 10.578002780675888 9.857720136642456 8.236721158027649 9.430650621652603 9.725867211818695 5.491462349891663 19.527624547481537 8644.798738080695 44269.587 Sub-Saharan Africa
295 Viet Nam MICS 2013/2014 Ethnic minorities 0.019334 0.070516 16.658276319503784 2241.5703036337763 42.33120679855347 14.158439636230469 ... 12.865620851516724 11.510226875543594 5.163108557462692 1.8797459080815315 8.144484460353851 3.68308387696743 13.949722051620483 13456.195951133965 96462.108 East Asia and the Paficic
296 Viet Nam MICS 2013/2014 Kinh/Hoa 0.019334 0.011037 2.988246828317642 2480.421534116371 36.93450093269348 16.32026880979538 ... 12.656877934932709 11.753645539283752 3.3999428153038025 0.6741166580468416 9.58368182182312 2.9566552489995956 86.05027794837952 83005.91204886603 96462.108 East Asia and the Paficic

296 rows × 26 columns

1dataset.isnull().sum()
Country                                                          0
Type of survey                                                   0
Survey year                                                      0
Ethnic/racial/caste group                                        0
MPI: Value for the country                                       0
MPI: Value for the group                                         0
Headcount (%)                                                    0
Number of multidimensionally poor people by group (thousands)    0
Intensity of deprivation (%)                                     0
Health (%)                                                       0
Education (%)                                                    0
Standard of living (%)                                           0
Nutrition (%)                                                    0
Child mortality (%)                                              0
Years of schooling (%)                                           0
School attendance (%)                                            0
Cooking fuel (%)                                                 0
Sanitation (%)                                                   0
Drinking water (%)                                               0
Electricity (%)                                                  0
 Housing (%)                                                     0
Assets (%)                                                       0
Population share by group (%)                                    0
Population size by group (thousands)                             0
Population size (thousands)                                      1
Region                                                           0
dtype: int64
1null_data = dataset[dataset.isnull().any(axis=1)]
2null_data
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
96 Georgia MICS 2018 Other 0.001245 0.0 0 0 0 0 ... 0 0 0 0 0 0 2.4445995688438416 97.70482661971451 NaN Europe and Central Asia

1 rows × 26 columns

1# Population size known from previous rows
2# dataset.fillna({"Population size (thousands)":3996.762}, inplace=True)
3dataset["Population size (thousands)"].fillna(method="ffill", inplace=True)
1dataset.isnull().sum()
Country                                                          0
Type of survey                                                   0
Survey year                                                      0
Ethnic/racial/caste group                                        0
MPI: Value for the country                                       0
MPI: Value for the group                                         0
Headcount (%)                                                    0
Number of multidimensionally poor people by group (thousands)    0
Intensity of deprivation (%)                                     0
Health (%)                                                       0
Education (%)                                                    0
Standard of living (%)                                           0
Nutrition (%)                                                    0
Child mortality (%)                                              0
Years of schooling (%)                                           0
School attendance (%)                                            0
Cooking fuel (%)                                                 0
Sanitation (%)                                                   0
Drinking water (%)                                               0
Electricity (%)                                                  0
 Housing (%)                                                     0
Assets (%)                                                       0
Population share by group (%)                                    0
Population size by group (thousands)                             0
Population size (thousands)                                      0
Region                                                           0
dtype: int64
1dataset.to_csv(f"{data_path}/cleaned_mpi_disagg_by_groups.csv", index=False)