Naïve Bayes algorithm#

The algorithm converts the input data into a summary of occurrences of each class label against each feature, which is then used to calculate the likelihood of one event (a class label), given a combination of features. This likelihood is normalized against the likelihood of the other class labels.

The result is the probability of an instance belonging to each class label. The sum of the probabilities must be one, and the class label with a higher probability is the one that the algorithm chooses as the prediction.

Importing libraries and packages#

 1# Mathematical operations and data manipulation
 2import pandas as pd
 3
 4# Model
 5from sklearn.naive_bayes import GaussianNB
 6
 7# Warnings
 8import warnings
 9
10warnings.filterwarnings("ignore")
11
12%matplotlib inline

Set paths#

1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"

Loading dataset#

The Fertility dataset aims to determine whether the fertility level of an individual has been affected by their demographics, their environmental conditions, and their previous medical conditions

1dataset = pd.read_csv(f"{data_path}/fertility_Diagnosis.csv", header=None)

Exploring dataset#

1# Shape of the dataset
2print("Shape of the dataset: ", dataset.shape)
3# Head
4dataset
Shape of the dataset:  (100, 10)
0 1 2 3 4 5 6 7 8 9
0 -0.33 0.69 0 1 1 0 0.8 0 0.88 N
1 -0.33 0.94 1 0 1 0 0.8 1 0.31 O
2 -0.33 0.50 1 0 0 0 1.0 -1 0.50 N
3 -0.33 0.75 0 1 1 0 1.0 -1 0.38 N
4 -0.33 0.67 1 1 0 0 0.8 -1 0.50 O
... ... ... ... ... ... ... ... ... ... ...
95 -1.00 0.67 1 0 0 0 1.0 -1 0.50 N
96 -1.00 0.61 1 0 0 0 0.8 0 0.50 N
97 -1.00 0.67 1 1 1 0 1.0 -1 0.31 N
98 -1.00 0.64 1 0 1 0 1.0 0 0.19 N
99 -1.00 0.69 0 1 1 0 0.6 -1 0.19 N

100 rows × 10 columns

Modelling#

1X = dataset.iloc[:, :9]
2Y = dataset.iloc[:, 9]
1model = GaussianNB()
2model.fit(X, Y)
GaussianNB()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
1# Testing by performing a prediction for a new instance with feature values
2# −0.33 , 0.69 , 0 , 1 , 1 , 0 , 0.8 , 0 , 0.88
3pred = model.predict([[-0.33, 0.69, 0, 1, 1, 0, 0.8, 0, 0.88]])
4print(pred)
['N']