Logistic Regression Model Using PyTorch

Title: Logistic Regression Model Using PyTorch

Author: Antonio Lorenzo

Subject: Machine learning

Language: English

This project presents a simple binary classification problem using logistic regression. The goal is to train a model to distinguish between two groups of data, which are centered around arbitrary points, using PyTorch. Below, I explain each part of the code.

Importing Libraries

import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.optim as optim

We begin by importing the necessary libraries:

torch for working with tensors and neural networks.
matplotlib.pyplot to visualize data.
numpy for efficient numerical operations.
torch.nn to define our model (logistic regression).
torch.optim to implement optimization algorithms like SGD (Stochastic Gradient Descent).

Preparing the Data

# Creating data pairs
n = 100
h = n//2
dimen = 2

# Creating random data
data = np.random.randn(n, dimen)*3
#data

We generate a dataset of 100 points with 2 features (dimensions), using numpy's random function, multiplying by 3 to increase variance. This data is not yet classified but will be later.

Plotting the Raw Data

plt.scatter(data[:,0],data[:,1])

Here, we plot the initial random data points. Each point represents a data sample in a 2D space. At this stage, the data is not centered around any specific points, and all points are plotted in a single color.

Centering the Data

# Shifting groups around arbitrary points
data[:h,:] = data[:h,:] - 3*np.ones((h, dimen))

data[h:,:] = data[h:,:] + 3*np.ones((h, dimen))

The dataset is split into two groups:

The first half of the data (data[:h, :]) is shifted towards (-3, -3).
The second half (data[h:, :]) is shifted towards (3, 3).

This creates two distinguishable clusters that we can classify.

Plotting the Centered Data

colors = ['blue','red']
color = np.array([colors[0]]*h + [colors[1]]*h).reshape(n)
plt.scatter(data[:,0],data[:,1], c=color, s=75, alpha=0.6)

Now we visualize the two groups of data:

The blue points represent one class.
The red points represent the second class.

The points are clearly centered around (-3, -3) and (3, 3).

Defining the Target and Tensors

# Creating the target labels
target = np.array([0]*h + [1]*h).reshape(n, 1)

# Creating tensors for input (x) and output (y)

x = torch.from_numpy(data).float().requires_grad_(True)

y = torch.from_numpy(target).float()

Here, we create the target labels:

0 for the first group (blue points).
1 for the second group (red points).

The input data (x) and the target (y) are converted into PyTorch tensors, and the input tensor requires gradient computation for backpropagation.

Building the Logistic Regression Model

# Model definition
model = nn.Sequential(
            nn.Linear(2, 1),
            nn.Sigmoid()
    )

We define a simple logistic regression model using PyTorch's Sequential API:

nn.Linear(2, 1) creates a linear transformation layer with 2 input features and 1 output.
nn.Sigmoid() applies the sigmoid function to output probabilities.

Training the Model

# Loss function and optimizer
loss_function = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
losses = []
iterations = 1000
for i in range(iterations):
    #FORWARDPASS
    result = model(x)
    
    loss = loss_function(result, y)
    
    losses.append(loss.data)
    
    optimizer.zero_grad()
    #BACKWARDPASS
    loss.backward()
    
    optimizer.step()

Here we set up the training process:

The loss function used is binary cross-entropy (nn.BCELoss()), which is common for binary classification.
Stochastic Gradient Descent (optim.SGD) is used as the optimizer with a learning rate of 0.01.

The model is trained for 1000 iterations. In each iteration:

The forward pass computes the model’s prediction.
The loss is calculated between the predicted output and the true target.
Gradients are computed, and the model parameters are updated.

Plotting the Loss

plt.plot(range(iterations), losses)

After training, we visualize the loss over iterations. The decreasing curve indicates that the model is learning and the loss is being minimized.

Testing the Model

We test the model by predicting the class of a point. In this case, the point (-5, -6) is classified as blue. This step is repeated for a red point.

blue = torch.Tensor([[-5, -6]])
prediction = model(blue).data[0][0] > 0.5
print(colors[prediction])

red = torch.Tensor([[+5, +6]])
prediction = model(red).data[0][0] > 0.5
print(colors[prediction])

Visualizing the Decision Boundary

w = list(model.parameters())
w0 = w[0].data.numpy()
w1 = w[1].data.numpy()


plt.scatter(data[:,0],data[:,1], c=color, s=75, alpha=0.6)
x_axis = np.linspace(-10,10,n)
y_axis = -(w1[0] + x_axis*w0[0][0]) / w0[0][1]
plt.plot(x_axis, y_axis, 'g--')

Finally, we visualize the decision boundary created by the logistic regression model. The line separates the two groups based on the learned weights.

Conclusion

This project demonstrates the implementation of a simple logistic regression classifier using PyTorch. The model successfully learns to separate two distinct groups of data, and we visualize the decision boundary that divides the two classes.