Dropping Missing Values with dropna#
Importing libraries and packages#
1# Mathematical operations and data manipulation
2import pandas as pd
Set paths#
1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"
Loading dataset#
1dataset = pd.read_csv(f"{data_path}/changed_columns_mpi_disagg_by_groups.csv")
Wrangling#
1dataset.head()
Country | Country.1 | Type of survey | Type of survey.1 | Survey year | Survey year.1 | Ethnic/racial/caste group | Ethnic/racial/caste group.1 | MPI: Value for the country | MPI: Value for the country.1 | ... | Electricity (%) | Electricity | Housing (%) | Housing | Assets (%) | a.5 | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | (%) | NaN | (%) | NaN | (%) | NaN | (%) | (thousands) | (thousands) | NaN |
1 | Bangladesh | NaN | MICS | NaN | 2019 | NaN | Bengali | NaN | 0.104060 | NaN | ... | 2.3087142035365105 | NaN | 12.478602677583694 | NaN | 8.562996238470078 | NaN | 98.80924224853516 | 161104.68805653573 | 163046.173 | South Asia |
2 | Bangladesh | NaN | MICS | NaN | 2019 | NaN | Other | NaN | 0.104060 | NaN | ... | 8.593330532312393 | NaN | 11.536150425672531 | NaN | 10.738271474838257 | NaN | 1.1907564476132393 | 1941.4828175841367 | 163046.173 | South Asia |
3 | Belize | NaN | MICS | NaN | 2015/2016 | NaN | Creole | NaN | 0.017109 | NaN | ... | 3.3835913985967636 | NaN | 6.162910908460617 | NaN | 4.409920796751976 | NaN | 22.916001081466675 | 89.45283938151599 | 390.351 | Latin America and the Caribbean |
4 | Belize | NaN | MICS | NaN | 2015/2016 | NaN | Garifuna | NaN | 0.017109 | NaN | ... | 2.963019721210003 | NaN | 2.963019721210003 | NaN | 2.963019721210003 | NaN | 5.251431092619896 | 20.49901378435269 | 390.351 | Latin America and the Caribbean |
5 rows × 48 columns
1dataset.shape
(322, 48)
1# Dropping the empty columns
2dataset.dropna(how="all", axis=1, inplace=True)
3dataset
Country | Type of survey | Survey year | Ethnic/racial/caste group | MPI: Value for the country | MPI: Value for the country.1 | MPI: Value for the group | a | Headcount (%) | a.1 | ... | Electricity (%) | Electricity | Housing (%) | Housing | Assets (%) | a.5 | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | (%) | NaN | ... | (%) | NaN | (%) | NaN | (%) | NaN | (%) | (thousands) | (thousands) | NaN |
1 | Bangladesh | MICS | 2019 | Bengali | 0.104060 | NaN | 0.102702 | NaN | 24.3847593665123 | NaN | ... | 2.3087142035365105 | NaN | 12.478602677583694 | NaN | 8.562996238470078 | NaN | 98.80924224853516 | 161104.68805653573 | 163046.173 | South Asia |
2 | Bangladesh | MICS | 2019 | Other | 0.104060 | NaN | 0.216783 | NaN | 45.86809277534485 | NaN | ... | 8.593330532312393 | NaN | 11.536150425672531 | NaN | 10.738271474838257 | NaN | 1.1907564476132393 | 1941.4828175841367 | 163046.173 | South Asia |
3 | Belize | MICS | 2015/2016 | Creole | 0.017109 | NaN | 0.003768 | NaN | 1.0518179275095463 | NaN | ... | 3.3835913985967636 | NaN | 6.162910908460617 | NaN | 4.409920796751976 | NaN | 22.916001081466675 | 89.45283938151599 | 390.351 | Latin America and the Caribbean |
4 | Belize | MICS | 2015/2016 | Garifuna | 0.017109 | NaN | 0.003887 | NaN | 1.0970834642648697 | NaN | ... | 2.963019721210003 | NaN | 2.963019721210003 | NaN | 2.963019721210003 | NaN | 5.251431092619896 | 20.49901378435269 | 390.351 | Latin America and the Caribbean |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
317 | Column 3: Refers to the self-identified ethnic... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
318 | Columns 4-21: HDRO and OPHI calculations based... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
319 | Columns 22 and 23: HDRO and OPHI calculations ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
320 | Column 24: United Nations Department of Econom... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
321 | Column 25: UNDP classification of developing r... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
322 rows × 44 columns
1# Drop multiple columns by name
2dataset.drop(
3 [
4 "MPI: Value for the country.1",
5 "a",
6 "a.1",
7 "a.2",
8 "a.3",
9 "Health",
10 "Education",
11 "a.4",
12 "Nutrition",
13 "Child mortality",
14 "Years of schooling",
15 "School attendance",
16 "Cooking fuel",
17 "Sanitation",
18 "Drinking water",
19 "Electricity",
20 "Housing",
21 "a.5",
22 ],
23 axis=1,
24 inplace=True,
25)
26dataset
Country | Type of survey | Survey year | Ethnic/racial/caste group | MPI: Value for the country | MPI: Value for the group | Headcount (%) | Number of multidimensionally poor people by group (thousands) | Intensity of deprivation (%) | Health (%) | ... | Cooking fuel (%) | Sanitation (%) | Drinking water (%) | Electricity (%) | Housing (%) | Assets (%) | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | (%) | (thousands) | (%) | (%) | ... | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (thousands) | (thousands) | NaN |
1 | Bangladesh | MICS | 2019 | Bengali | 0.104060 | 0.102702 | 24.3847593665123 | 39284.990510756514 | 42.117223143577576 | 17.44110882282257 | ... | 12.484664469957352 | 8.274626731872559 | 0.5694943480193615 | 2.3087142035365105 | 12.478602677583694 | 8.562996238470078 | 98.80924224853516 | 161104.68805653573 | 163046.173 | South Asia |
2 | Bangladesh | MICS | 2019 | Other | 0.104060 | 0.216783 | 45.86809277534485 | 890.521139986871 | 47.26235568523407 | 10.88151652365923 | ... | 11.73345148563385 | 10.198139399290085 | 8.354675769805908 | 8.593330532312393 | 11.536150425672531 | 10.738271474838257 | 1.1907564476132393 | 1941.4828175841367 | 163046.173 | South Asia |
3 | Belize | MICS | 2015/2016 | Creole | 0.017109 | 0.003768 | 1.0518179275095463 | 0.9408810012811046 | 35.820525884628296 | 52.086931467056274 | ... | 1.1262305080890656 | 3.9643649011850357 | 1.1262305080890656 | 3.3835913985967636 | 6.162910908460617 | 4.409920796751976 | 22.916001081466675 | 89.45283938151599 | 390.351 | Latin America and the Caribbean |
4 | Belize | MICS | 2015/2016 | Garifuna | 0.017109 | 0.003887 | 1.0970834642648697 | 0.22489129056550966 | 35.43311357498169 | 85.18490195274353 | ... | 2.963019721210003 | 2.963019721210003 | 0 | 2.963019721210003 | 2.963019721210003 | 2.963019721210003 | 5.251431092619896 | 20.49901378435269 | 390.351 | Latin America and the Caribbean |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
317 | Column 3: Refers to the self-identified ethnic... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
318 | Columns 4-21: HDRO and OPHI calculations based... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
319 | Columns 22 and 23: HDRO and OPHI calculations ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
320 | Column 24: United Nations Department of Econom... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
321 | Column 25: UNDP classification of developing r... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
322 rows × 26 columns
1# Drop the first row with units (are now included in column headers)
2dataset.drop([0], axis=0, inplace=True)
3dataset
Country | Type of survey | Survey year | Ethnic/racial/caste group | MPI: Value for the country | MPI: Value for the group | Headcount (%) | Number of multidimensionally poor people by group (thousands) | Intensity of deprivation (%) | Health (%) | ... | Cooking fuel (%) | Sanitation (%) | Drinking water (%) | Electricity (%) | Housing (%) | Assets (%) | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Bangladesh | MICS | 2019 | Bengali | 0.104060 | 0.102702 | 24.3847593665123 | 39284.990510756514 | 42.117223143577576 | 17.44110882282257 | ... | 12.484664469957352 | 8.274626731872559 | 0.5694943480193615 | 2.3087142035365105 | 12.478602677583694 | 8.562996238470078 | 98.80924224853516 | 161104.68805653573 | 163046.173 | South Asia |
2 | Bangladesh | MICS | 2019 | Other | 0.104060 | 0.216783 | 45.86809277534485 | 890.521139986871 | 47.26235568523407 | 10.88151652365923 | ... | 11.73345148563385 | 10.198139399290085 | 8.354675769805908 | 8.593330532312393 | 11.536150425672531 | 10.738271474838257 | 1.1907564476132393 | 1941.4828175841367 | 163046.173 | South Asia |
3 | Belize | MICS | 2015/2016 | Creole | 0.017109 | 0.003768 | 1.0518179275095463 | 0.9408810012811046 | 35.820525884628296 | 52.086931467056274 | ... | 1.1262305080890656 | 3.9643649011850357 | 1.1262305080890656 | 3.3835913985967636 | 6.162910908460617 | 4.409920796751976 | 22.916001081466675 | 89.45283938151599 | 390.351 | Latin America and the Caribbean |
4 | Belize | MICS | 2015/2016 | Garifuna | 0.017109 | 0.003887 | 1.0970834642648697 | 0.22489129056550966 | 35.43311357498169 | 85.18490195274353 | ... | 2.963019721210003 | 2.963019721210003 | 0 | 2.963019721210003 | 2.963019721210003 | 2.963019721210003 | 5.251431092619896 | 20.49901378435269 | 390.351 | Latin America and the Caribbean |
5 | Belize | MICS | 2015/2016 | Maya | 0.017109 | 0.078922 | 18.63195300102234 | 8.557939781538732 | 42.35815107822418 | 37.91183978319168 | ... | 11.931631714105606 | 7.811719179153442 | 2.3195721209049225 | 9.465593844652176 | 11.165109276771545 | 4.2670805007219315 | 11.766723543405533 | 45.93152301891893 | 390.351 | Latin America and the Caribbean |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
317 | Column 3: Refers to the self-identified ethnic... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
318 | Columns 4-21: HDRO and OPHI calculations based... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
319 | Columns 22 and 23: HDRO and OPHI calculations ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
320 | Column 24: United Nations Department of Econom... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
321 | Column 25: UNDP classification of developing r... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
321 rows × 26 columns
1dataset.isnull().sum()
Country 3
Type of survey 25
Survey year 25
Ethnic/racial/caste group 25
MPI: Value for the country 25
MPI: Value for the group 25
Headcount (%) 25
Number of multidimensionally poor people by group (thousands) 25
Intensity of deprivation (%) 25
Health (%) 25
Education (%) 25
Standard of living (%) 25
Nutrition (%) 25
Child mortality (%) 25
Years of schooling (%) 25
School attendance (%) 25
Cooking fuel (%) 25
Sanitation (%) 25
Drinking water (%) 25
Electricity (%) 25
Housing (%) 25
Assets (%) 25
Population share by group (%) 25
Population size by group (thousands) 25
Population size (thousands) 26
Region 25
dtype: int64
1# Drop the last 25 rows with notes
2n = 25
3dataset.drop(dataset.tail(25).index, inplace=True)
4dataset
Country | Type of survey | Survey year | Ethnic/racial/caste group | MPI: Value for the country | MPI: Value for the group | Headcount (%) | Number of multidimensionally poor people by group (thousands) | Intensity of deprivation (%) | Health (%) | ... | Cooking fuel (%) | Sanitation (%) | Drinking water (%) | Electricity (%) | Housing (%) | Assets (%) | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Bangladesh | MICS | 2019 | Bengali | 0.104060 | 0.102702 | 24.3847593665123 | 39284.990510756514 | 42.117223143577576 | 17.44110882282257 | ... | 12.484664469957352 | 8.274626731872559 | 0.5694943480193615 | 2.3087142035365105 | 12.478602677583694 | 8.562996238470078 | 98.80924224853516 | 161104.68805653573 | 163046.173 | South Asia |
2 | Bangladesh | MICS | 2019 | Other | 0.104060 | 0.216783 | 45.86809277534485 | 890.521139986871 | 47.26235568523407 | 10.88151652365923 | ... | 11.73345148563385 | 10.198139399290085 | 8.354675769805908 | 8.593330532312393 | 11.536150425672531 | 10.738271474838257 | 1.1907564476132393 | 1941.4828175841367 | 163046.173 | South Asia |
3 | Belize | MICS | 2015/2016 | Creole | 0.017109 | 0.003768 | 1.0518179275095463 | 0.9408810012811046 | 35.820525884628296 | 52.086931467056274 | ... | 1.1262305080890656 | 3.9643649011850357 | 1.1262305080890656 | 3.3835913985967636 | 6.162910908460617 | 4.409920796751976 | 22.916001081466675 | 89.45283938151599 | 390.351 | Latin America and the Caribbean |
4 | Belize | MICS | 2015/2016 | Garifuna | 0.017109 | 0.003887 | 1.0970834642648697 | 0.22489129056550966 | 35.43311357498169 | 85.18490195274353 | ... | 2.963019721210003 | 2.963019721210003 | 0 | 2.963019721210003 | 2.963019721210003 | 2.963019721210003 | 5.251431092619896 | 20.49901378435269 | 390.351 | Latin America and the Caribbean |
5 | Belize | MICS | 2015/2016 | Maya | 0.017109 | 0.078922 | 18.63195300102234 | 8.557939781538732 | 42.35815107822418 | 37.91183978319168 | ... | 11.931631714105606 | 7.811719179153442 | 2.3195721209049225 | 9.465593844652176 | 11.165109276771545 | 4.2670805007219315 | 11.766723543405533 | 45.93152301891893 | 390.351 | Latin America and the Caribbean |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
292 | Uganda | DHS | 2016 | Lango | 0.281028 | 0.331589 | 67.10416078567505 | 1765.8691906057863 | 49.414002895355225 | 22.405487298965454 | ... | 11.113078892230988 | 10.360502451658249 | 8.658754825592041 | 10.666937381029129 | 9.859275072813034 | 4.350616410374641 | 5.94433955848217 | 2631.53457241768 | 44269.587 | Sub-Saharan Africa |
293 | Uganda | DHS | 2016 | Lugbara | 0.281028 | 0.380233 | 71.11892700195312 | 809.2593583809123 | 53.46432328224182 | 21.582941338419914 | ... | 10.391145944595337 | 10.167766362428665 | 7.661116868257523 | 8.479014784097672 | 9.50494185090065 | 5.541577190160751 | 2.5703784078359604 | 1137.8959054861552 | 44269.587 | Sub-Saharan Africa |
294 | Uganda | DHS | 2016 | Other | 0.281028 | 0.348234 | 66.4842426776886 | 5747.428972023331 | 52.37846374511719 | 24.311363324522972 | ... | 10.578002780675888 | 9.857720136642456 | 8.236721158027649 | 9.430650621652603 | 9.725867211818695 | 5.491462349891663 | 19.527624547481537 | 8644.798738080695 | 44269.587 | Sub-Saharan Africa |
295 | Viet Nam | MICS | 2013/2014 | Ethnic minorities | 0.019334 | 0.070516 | 16.658276319503784 | 2241.5703036337763 | 42.33120679855347 | 14.158439636230469 | ... | 12.865620851516724 | 11.510226875543594 | 5.163108557462692 | 1.8797459080815315 | 8.144484460353851 | 3.68308387696743 | 13.949722051620483 | 13456.195951133965 | 96462.108 | East Asia and the Paficic |
296 | Viet Nam | MICS | 2013/2014 | Kinh/Hoa | 0.019334 | 0.011037 | 2.988246828317642 | 2480.421534116371 | 36.93450093269348 | 16.32026880979538 | ... | 12.656877934932709 | 11.753645539283752 | 3.3999428153038025 | 0.6741166580468416 | 9.58368182182312 | 2.9566552489995956 | 86.05027794837952 | 83005.91204886603 | 96462.108 | East Asia and the Paficic |
296 rows × 26 columns
1dataset.isnull().sum()
Country 0
Type of survey 0
Survey year 0
Ethnic/racial/caste group 0
MPI: Value for the country 0
MPI: Value for the group 0
Headcount (%) 0
Number of multidimensionally poor people by group (thousands) 0
Intensity of deprivation (%) 0
Health (%) 0
Education (%) 0
Standard of living (%) 0
Nutrition (%) 0
Child mortality (%) 0
Years of schooling (%) 0
School attendance (%) 0
Cooking fuel (%) 0
Sanitation (%) 0
Drinking water (%) 0
Electricity (%) 0
Housing (%) 0
Assets (%) 0
Population share by group (%) 0
Population size by group (thousands) 0
Population size (thousands) 1
Region 0
dtype: int64
1null_data = dataset[dataset.isnull().any(axis=1)]
2null_data
Country | Type of survey | Survey year | Ethnic/racial/caste group | MPI: Value for the country | MPI: Value for the group | Headcount (%) | Number of multidimensionally poor people by group (thousands) | Intensity of deprivation (%) | Health (%) | ... | Cooking fuel (%) | Sanitation (%) | Drinking water (%) | Electricity (%) | Housing (%) | Assets (%) | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
96 | Georgia | MICS | 2018 | Other | 0.001245 | 0.0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 2.4445995688438416 | 97.70482661971451 | NaN | Europe and Central Asia |
1 rows × 26 columns
1# Population size known from previous rows
2# dataset.fillna({"Population size (thousands)":3996.762}, inplace=True)
3dataset["Population size (thousands)"].fillna(method="ffill", inplace=True)
1dataset.isnull().sum()
Country 0
Type of survey 0
Survey year 0
Ethnic/racial/caste group 0
MPI: Value for the country 0
MPI: Value for the group 0
Headcount (%) 0
Number of multidimensionally poor people by group (thousands) 0
Intensity of deprivation (%) 0
Health (%) 0
Education (%) 0
Standard of living (%) 0
Nutrition (%) 0
Child mortality (%) 0
Years of schooling (%) 0
School attendance (%) 0
Cooking fuel (%) 0
Sanitation (%) 0
Drinking water (%) 0
Electricity (%) 0
Housing (%) 0
Assets (%) 0
Population share by group (%) 0
Population size by group (thousands) 0
Population size (thousands) 0
Region 0
dtype: int64
1dataset.to_csv(f"{data_path}/cleaned_mpi_disagg_by_groups.csv", index=False)