Support vector machine algorithm#

A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression. SVMs are more commonly used in classification problems. SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes. Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane. The further from the hyperplane data points lie, the more likely it is they have been correctly classified.

By default, the SVM algorithm uses a linear function to split the data points of the input data and can be modified by changing the kernel type of the algorithm. The default Radial Basis Function (RBF) kernel works great for most data problems.

Importing libraries and packages#

 1# Mathematical operations and data manipulation
 2import pandas as pd
 3
 4# Model
 5from sklearn.svm import SVC
 6
 7# Warnings
 8import warnings
 9
10warnings.filterwarnings("ignore")
11
12%matplotlib inline

Set paths#

1# Path to datasets directory
2data_path = "./datasets"
3# Path to assets directory (for saving results to)
4assets_path = "./assets"

Loading dataset#

The Fertility dataset aims to determine whether the fertility level of an individual has been affected by their demographics, their environmental conditions, and their previous medical conditions

1dataset = pd.read_csv(f"{data_path}/fertility_Diagnosis.csv", header=None)

Exploring dataset#

1# Shape of the dataset
2print("Shape of the dataset: ", dataset.shape)
3# Head
4dataset
Shape of the dataset:  (100, 10)
0 1 2 3 4 5 6 7 8 9
0 -0.33 0.69 0 1 1 0 0.8 0 0.88 N
1 -0.33 0.94 1 0 1 0 0.8 1 0.31 O
2 -0.33 0.50 1 0 0 0 1.0 -1 0.50 N
3 -0.33 0.75 0 1 1 0 1.0 -1 0.38 N
4 -0.33 0.67 1 1 0 0 0.8 -1 0.50 O
... ... ... ... ... ... ... ... ... ... ...
95 -1.00 0.67 1 0 0 0 1.0 -1 0.50 N
96 -1.00 0.61 1 0 0 0 0.8 0 0.50 N
97 -1.00 0.67 1 1 1 0 1.0 -1 0.31 N
98 -1.00 0.64 1 0 1 0 1.0 0 0.19 N
99 -1.00 0.69 0 1 1 0 0.6 -1 0.19 N

100 rows × 10 columns

Modelling#

1X = dataset.iloc[:, :9]
2Y = dataset.iloc[:, 9]
1model = SVC()
2model.fit(X, Y)
SVC()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
1# Testing by performing a prediction for a new instance with feature values
2# −0.33 , 0.69 , 0 , 1 , 1 , 0 , 0.8 , 0 , 0.88
3pred = model.predict([[-0.33, 0.69, 0, 1, 1, 0, 0.8, 0, 0.88]])
4print(pred)
['N']