Lung cancer is one of leading causes of death worldwide and early detection is crucial to improving survival rates. In recent years machine learning algorithms such as Support Vector Machines (SVM) have shown promising results in the early detection of lung cancer. In this article we will walk through the process of developing a Python program for lung cancer detection using SVM.
Step 1: Gathering Data
The first step in any machine learning project is to gather necessary data. In our case we will be using the LIDC-IDRI dataset which is a publicly available dataset of lung CT scans. This dataset contains over 1,000 scans with annotations for each scan indicating the presence or absence of cancer.
Step 2: Preprocess Data
Before we can use data for training our SVM model we need to preprocess it. This involves a number of steps including:
a) Converting CT scans to grayscale images. b) Resizing images to a standard size. c) Normalizing pixel values to a range of 0 to 1.
Step 3: Feature Extraction
The next step is to extract features from preprocessed images. In our case we will be using a technique called Local Binary Patterns (LBP) to extract features from each image. LBP is a texture descriptor that has been shown to be effective in a number of computer vision tasks.
Step 4: Splitting the Data
Once we have extracted features from each image we need to split the data into training and testing sets. This is important to ensure that our model is not overfitting to the training data and can generalize to new data.
Step 5: Training SVM Model
Now we are ready to train our SVM model. We will be using scikit-learn library to train our model. The SVM algorithm works by finding a hyperplane that best separates the two classes of data. We will be using a radial basis function (RBF) kernel which is a popular choice for SVM classification problems.
Step 6: Evaluating Model
Once we have trained our model we need to evaluate its performance. We will be using a number of metrics including accuracy, precision, recall, and F1 score, to evaluate our model.
Step 7: Making Predictions
Finally we can use our trained SVM model to make predictions on new data. We will be using sample CT scan to demonstrate how our model works.

Now, let’s look at complete code implementation for lung cancer detection using SVM.
# Importing the necessary libraries
import numpy as np
import cv2
import os
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Step 1: Gathering Data
data_path = '/path/to/LIDC-IDRI/dataset'
cancer_labels = [1] * 500 + [0] * 500
data = []
for i in range(1000):
filename = os.path.join(data_path, f"{i}.tif")
img = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (128, 128))
img = img / 255.0
data.append(img)
# Step 2: Preprocessing the Data
data = np.array(data)
labels = np.array(cancer_labels)
# Step 3: Feature Extraction
def get_lbp_features(img):
radius = 3
n_points = 8 * radius
lbp = np.zeros_like(img)
for i in range(radius, img.shape[0]-radius):
for j in range(radius, img.shape[1]-radius):
lbp_value = 0
for k in range(n_points):
x = i + int(radius * np.cos(2 * np.pi * k / n_points))
y = j - int(radius * np.sin(2 * np.pi * k / n_points))
if img[x, y] > img[i, j]:
lbp_value += 2**k
lbp[i, j] = lbp_value
hist, _ = np.histogram(lbp, bins=range(257))
hist = hist.astype("float")
hist /= (hist.sum() + 1e-7)
return hist
features = []
for i in range(data.shape[0]):
features.append(get_lbp_features(data[i]))
features = np.array(features)
# Step 4: Splitting the Data
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
# Step 5: Training the SVM Model
model = SVC(kernel='rbf', C=10, gamma=0.01)
model.fit(X_train, y_train)
# Step 6: Evaluating the Model
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
# Step 7: Making Predictions
sample_filename = '/path/to/sample/image.tif'
sample_img = cv2.imread(sample_filename, cv2.IMREAD_GRAYSCALE)
sample_img = cv2.resize(sample_img, (128, 128))
sample_img = sample_img / 255.0
sample_features = get_lbp_features(sample_img)
sample_features = np.array(sample_features).reshape(1, -1)
sample_prediction = model.predict(sample_features)
if sample_prediction == 1:
print("The sample image indicates the presence of lung cancer.")
else:
print("The sample image does not indicate the presence of lung cancer.")
Here are some useful links related to lung cancer detection and the tools used in the Python program:
- Lung Image Database Consortium (LIDC-IDRI): https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
- Support Vector Machine (SVM) in scikit-learn: https://scikit-learn.org/stable/modules/svm.html
- Local Binary Patterns (LBP) feature extraction: https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_local_binary_pattern.html
- OpenCV (Open Source Computer Vision Library): https://opencv.org/
- Python Programming Language: https://www.python.org/
- Google Colaboratory: https://colab.research.google.com/
- Medical News Today (source of sample lung image): https://www.medicalnewstoday.com/