User-defined functions#
User-defined functions can be run through the apply method. Much like the native Python apply function, this method accepts a user-defined function and additional arguments and returns a new column after applying the function on a particular column elementwise.
Importing libraries and packages#
1# Mathematical operations and data manipulation
2import pandas as pd
Set paths#
1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"
Loading dataset#
1dataset = pd.read_csv(f"{data_path}/cleaned_mpi_disagg_by_groups.csv")
Wrangling#
1dataset
Country | Type of survey | Survey year | Ethnic/racial/caste group | MPI: Value for the country | MPI: Value for the group | Headcount (%) | Number of multidimensionally poor people by group (thousands) | Intensity of deprivation (%) | Health (%) | ... | Cooking fuel (%) | Sanitation (%) | Drinking water (%) | Electricity (%) | Housing (%) | Assets (%) | Population share by group (%) | Population size by group (thousands) | Population size (thousands) | Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Bangladesh | MICS | 2019 | Bengali | 0.104060 | 0.102702 | 24.384759 | 39284.990511 | 42.117223 | 17.441109 | ... | 12.484664 | 8.274627 | 0.569494 | 2.308714 | 12.478603 | 8.562996 | 98.809242 | 161104.688057 | 163046.173 | South Asia |
1 | Bangladesh | MICS | 2019 | Other | 0.104060 | 0.216783 | 45.868093 | 890.521140 | 47.262356 | 10.881517 | ... | 11.733451 | 10.198139 | 8.354676 | 8.593331 | 11.536150 | 10.738271 | 1.190756 | 1941.482818 | 163046.173 | South Asia |
2 | Belize | MICS | 2015/2016 | Creole | 0.017109 | 0.003768 | 1.051818 | 0.940881 | 35.820526 | 52.086931 | ... | 1.126231 | 3.964365 | 1.126231 | 3.383591 | 6.162911 | 4.409921 | 22.916001 | 89.452839 | 390.351 | Latin America and the Caribbean |
3 | Belize | MICS | 2015/2016 | Garifuna | 0.017109 | 0.003887 | 1.097083 | 0.224891 | 35.433114 | 85.184902 | ... | 2.963020 | 2.963020 | 0.000000 | 2.963020 | 2.963020 | 2.963020 | 5.251431 | 20.499014 | 390.351 | Latin America and the Caribbean |
4 | Belize | MICS | 2015/2016 | Maya | 0.017109 | 0.078922 | 18.631953 | 8.557940 | 42.358151 | 37.911840 | ... | 11.931632 | 7.811719 | 2.319572 | 9.465594 | 11.165109 | 4.267081 | 11.766724 | 45.931523 | 390.351 | Latin America and the Caribbean |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
291 | Uganda | DHS | 2016 | Lango | 0.281028 | 0.331589 | 67.104161 | 1765.869191 | 49.414003 | 22.405487 | ... | 11.113079 | 10.360502 | 8.658755 | 10.666937 | 9.859275 | 4.350616 | 5.944340 | 2631.534572 | 44269.587 | Sub-Saharan Africa |
292 | Uganda | DHS | 2016 | Lugbara | 0.281028 | 0.380233 | 71.118927 | 809.259358 | 53.464323 | 21.582941 | ... | 10.391146 | 10.167766 | 7.661117 | 8.479015 | 9.504942 | 5.541577 | 2.570378 | 1137.895905 | 44269.587 | Sub-Saharan Africa |
293 | Uganda | DHS | 2016 | Other | 0.281028 | 0.348234 | 66.484243 | 5747.428972 | 52.378464 | 24.311363 | ... | 10.578003 | 9.857720 | 8.236721 | 9.430651 | 9.725867 | 5.491462 | 19.527625 | 8644.798738 | 44269.587 | Sub-Saharan Africa |
294 | Viet Nam | MICS | 2013/2014 | Ethnic minorities | 0.019334 | 0.070516 | 16.658276 | 2241.570304 | 42.331207 | 14.158440 | ... | 12.865621 | 11.510227 | 5.163109 | 1.879746 | 8.144484 | 3.683084 | 13.949722 | 13456.195951 | 96462.108 | East Asia and the Paficic |
295 | Viet Nam | MICS | 2013/2014 | Kinh/Hoa | 0.019334 | 0.011037 | 2.988247 | 2480.421534 | 36.934501 | 16.320269 | ... | 12.656878 | 11.753646 | 3.399943 | 0.674117 | 9.583682 | 2.956655 | 86.050278 | 83005.912049 | 96462.108 | East Asia and the Paficic |
296 rows × 26 columns
1dataset_subset = dataset.loc[
2 [i for i in range(20)],
3 ["Country", "MPI: Value for the country", "Intensity of deprivation (%)"],
4]
5print(dataset_subset)
Country MPI: Value for the country \
0 Bangladesh 0.104060
1 Bangladesh 0.104060
2 Belize 0.017109
3 Belize 0.017109
4 Belize 0.017109
5 Belize 0.017109
6 Belize 0.017109
7 Bolivia, Plurinational State of 0.037754
8 Bolivia, Plurinational State of 0.037754
9 Bolivia, Plurinational State of 0.037754
10 Bolivia, Plurinational State of 0.037754
11 Bolivia, Plurinational State of 0.037754
12 Burkina Faso 0.523424
13 Burkina Faso 0.523424
14 Burkina Faso 0.523424
15 Burkina Faso 0.523424
16 Burkina Faso 0.523424
17 Burkina Faso 0.523424
18 Burkina Faso 0.523424
19 Burkina Faso 0.523424
Intensity of deprivation (%)
0 42.117223
1 47.262356
2 35.820526
3 35.433114
4 42.358151
5 36.699757
6 39.199564
7 37.935901
8 33.333334
9 41.581705
10 43.263215
11 43.184847
12 55.149454
13 56.443775
14 62.004858
15 53.393632
16 68.189025
17 70.047671
18 59.310508
19 70.925540
1def categorize_iop(iop):
2 if iop < 10:
3 return "Low Intensity of deprivation (%)"
4 elif iop < 40:
5 return "Medium Intensity of deprivation (%)"
6 else:
7 return "High Intensity of deprivation (%)"
1dataset_sample = dataset[
2 [
3 "Country",
4 "MPI: Value for the country",
5 "Intensity of deprivation (%)",
6 "Ethnic/racial/caste group",
7 "Number of multidimensionally poor people by group (thousands)",
8 ]
9].sample(n=100)
10dataset_sample
Country | MPI: Value for the country | Intensity of deprivation (%) | Ethnic/racial/caste group | Number of multidimensionally poor people by group (thousands) | |
---|---|---|---|---|---|
7 | Bolivia, Plurinational State of | 0.037754 | 37.935901 | Aymara | 223.806399 |
69 | Cuba | 0.002689 | 38.146138 | Mulato/Mestizo/Other | 23.120644 |
247 | Sierra Leone | 0.292899 | 52.130735 | Korankoh | 212.726294 |
193 | Mongolia | 0.028127 | 40.553683 | Other | 57.338328 |
242 | Serbia | 0.000433 | 39.860407 | Roma | 2.019261 |
... | ... | ... | ... | ... | ... |
156 | Kyrgyzstan | 0.001426 | 0.000000 | Russian | 0.000000 |
198 | Nigeria | 0.254390 | 45.378658 | Igala | 635.924786 |
208 | Paraguay | 0.018849 | 36.654904 | Guaraní and Spanish speaker | 25.251111 |
234 | Senegal | 0.262862 | 49.028081 | Other /non Senegalese | 357.846501 |
249 | Sierra Leone | 0.292899 | 44.758216 | Loko | 65.443718 |
100 rows × 5 columns
1dataset_sample["Intensity of deprivation Category"] = dataset_sample[
2 "Intensity of deprivation (%)"
3].apply(categorize_iop)
4dataset_sample.head(10)
Country | MPI: Value for the country | Intensity of deprivation (%) | Ethnic/racial/caste group | Number of multidimensionally poor people by group (thousands) | Intensity of deprivation Category | |
---|---|---|---|---|---|---|
7 | Bolivia, Plurinational State of | 0.037754 | 37.935901 | Aymara | 223.806399 | Medium Intensity of deprivation (%) |
69 | Cuba | 0.002689 | 38.146138 | Mulato/Mestizo/Other | 23.120644 | Medium Intensity of deprivation (%) |
247 | Sierra Leone | 0.292899 | 52.130735 | Korankoh | 212.726294 | High Intensity of deprivation (%) |
193 | Mongolia | 0.028127 | 40.553683 | Other | 57.338328 | High Intensity of deprivation (%) |
242 | Serbia | 0.000433 | 39.860407 | Roma | 2.019261 | Medium Intensity of deprivation (%) |
158 | Lao People's Democratic Republic | 0.108333 | 46.754810 | Chinese-Tibetan | 86.627640 | High Intensity of deprivation (%) |
154 | Kyrgyzstan | 0.001426 | 36.415604 | Kyrgyz | 12.373465 | Medium Intensity of deprivation (%) |
88 | Gambia | 0.203638 | 45.463789 | Mandinka | 261.295819 | High Intensity of deprivation (%) |
295 | Viet Nam | 0.019334 | 36.934501 | Kinh/Hoa | 2480.421534 | Medium Intensity of deprivation (%) |
84 | Gabon | 0.069695 | 66.626418 | Pygmée | 5.908309 | High Intensity of deprivation (%) |