Sorting by column values#

Sorting can help in understanding the data better, and can give a specific view of the data. When training a machine learning model, the way data is sorted can impact the performance of a model based on the sampling that’s being done.

Importing libraries and packages#

1# Mathematical operations and data manipulation
2import pandas as pd

Set paths#

1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"

Loading dataset#

1dataset = pd.read_csv(f"{data_path}/cleaned_mpi_disagg_by_groups.csv")

Wrangling#

1dataset.head()
Country Type of survey Survey year Ethnic/racial/caste group MPI: Value for the country MPI: Value for the group Headcount (%) Number of multidimensionally poor people by group (thousands) Intensity of deprivation (%) Health (%) ... Cooking fuel (%) Sanitation (%) Drinking water (%) Electricity (%) Housing (%) Assets (%) Population share by group (%) Population size by group (thousands) Population size (thousands) Region
0 Bangladesh MICS 2019 Bengali 0.104060 0.102702 24.384759 39284.990511 42.117223 17.441109 ... 12.484664 8.274627 0.569494 2.308714 12.478603 8.562996 98.809242 161104.688057 163046.173 South Asia
1 Bangladesh MICS 2019 Other 0.104060 0.216783 45.868093 890.521140 47.262356 10.881517 ... 11.733451 10.198139 8.354676 8.593331 11.536150 10.738271 1.190756 1941.482818 163046.173 South Asia
2 Belize MICS 2015/2016 Creole 0.017109 0.003768 1.051818 0.940881 35.820526 52.086931 ... 1.126231 3.964365 1.126231 3.383591 6.162911 4.409921 22.916001 89.452839 390.351 Latin America and the Caribbean
3 Belize MICS 2015/2016 Garifuna 0.017109 0.003887 1.097083 0.224891 35.433114 85.184902 ... 2.963020 2.963020 0.000000 2.963020 2.963020 2.963020 5.251431 20.499014 390.351 Latin America and the Caribbean
4 Belize MICS 2015/2016 Maya 0.017109 0.078922 18.631953 8.557940 42.358151 37.911840 ... 11.931632 7.811719 2.319572 9.465594 11.165109 4.267081 11.766724 45.931523 390.351 Latin America and the Caribbean

5 rows × 26 columns

 1dataset_sample = dataset[
 2    [
 3        "Country",
 4        "MPI: Value for the country",
 5        "Intensity of deprivation (%)",
 6        "Ethnic/racial/caste group",
 7        "Number of multidimensionally poor people by group (thousands)",
 8    ]
 9].sample(
10    n=15
11)  # noqa
12dataset_sample
Country MPI: Value for the country Intensity of deprivation (%) Ethnic/racial/caste group Number of multidimensionally poor people by group (thousands)
164 Malawi 0.252325 45.654917 Lomwe 1605.694658
226 Philippines 0.024249 44.883090 Maranao 290.285282
290 Uganda 0.281028 48.582491 Iteso 2146.800347
170 Malawi 0.252325 48.004910 Sena 338.484478
40 Chad 0.517011 52.718043 Gabri/Nangtchére 196.955229
201 Nigeria 0.254390 60.869312 Kanuri/Beriberi 3667.493263
293 Uganda 0.281028 52.378464 Other 5747.428972
151 Kenya 0.170776 57.261908 Somali 1170.103478
149 Kenya 0.170776 50.156796 Other 1025.676048
72 Ecuador 0.018254 42.166322 Indigenous 250.663831
47 Chad 0.517011 49.858367 Moundang 372.385169
146 Kenya 0.170776 50.231904 Maasai 687.913320
233 Senegal 0.262862 52.941322 Mandingue/ Socé 420.763839
249 Sierra Leone 0.292899 44.758216 Loko 65.443718
11 Bolivia, Plurinational State of 0.037754 43.184847 Quechua 441.861271
1dataset_sample.sort_values(by="Country")
Country MPI: Value for the country Intensity of deprivation (%) Ethnic/racial/caste group Number of multidimensionally poor people by group (thousands)
11 Bolivia, Plurinational State of 0.037754 43.184847 Quechua 441.861271
40 Chad 0.517011 52.718043 Gabri/Nangtchére 196.955229
47 Chad 0.517011 49.858367 Moundang 372.385169
72 Ecuador 0.018254 42.166322 Indigenous 250.663831
151 Kenya 0.170776 57.261908 Somali 1170.103478
149 Kenya 0.170776 50.156796 Other 1025.676048
146 Kenya 0.170776 50.231904 Maasai 687.913320
164 Malawi 0.252325 45.654917 Lomwe 1605.694658
170 Malawi 0.252325 48.004910 Sena 338.484478
201 Nigeria 0.254390 60.869312 Kanuri/Beriberi 3667.493263
226 Philippines 0.024249 44.883090 Maranao 290.285282
233 Senegal 0.262862 52.941322 Mandingue/ Socé 420.763839
249 Sierra Leone 0.292899 44.758216 Loko 65.443718
290 Uganda 0.281028 48.582491 Iteso 2146.800347
293 Uganda 0.281028 52.378464 Other 5747.428972
1dataset_sample.sort_values(by=["Country", "MPI: Value for the country"])
Country MPI: Value for the country Intensity of deprivation (%) Ethnic/racial/caste group Number of multidimensionally poor people by group (thousands)
11 Bolivia, Plurinational State of 0.037754 43.184847 Quechua 441.861271
40 Chad 0.517011 52.718043 Gabri/Nangtchére 196.955229
47 Chad 0.517011 49.858367 Moundang 372.385169
72 Ecuador 0.018254 42.166322 Indigenous 250.663831
151 Kenya 0.170776 57.261908 Somali 1170.103478
149 Kenya 0.170776 50.156796 Other 1025.676048
146 Kenya 0.170776 50.231904 Maasai 687.913320
164 Malawi 0.252325 45.654917 Lomwe 1605.694658
170 Malawi 0.252325 48.004910 Sena 338.484478
201 Nigeria 0.254390 60.869312 Kanuri/Beriberi 3667.493263
226 Philippines 0.024249 44.883090 Maranao 290.285282
233 Senegal 0.262862 52.941322 Mandingue/ Socé 420.763839
249 Sierra Leone 0.292899 44.758216 Loko 65.443718
290 Uganda 0.281028 48.582491 Iteso 2146.800347
293 Uganda 0.281028 52.378464 Other 5747.428972