- Published on
Simple Linear Regression Model
- Authors
- Name
- Antonio Lorenzo
- @ToniLorenzo28
Title: Simple Linear Regression Model
Author: Antonio Lorenzo
Subject: Machine learning
Language: English
In this project, I will demonstrate how to build a linear regression model using the scikit-learn library to predict housing prices based on the number of rooms. The dataset used is the "Boston Housing Dataset," which contains information about different factors that influence housing prices in Boston.
Importing Libraries
First, we need to import the necessary libraries. These include numpy for numerical operations, matplotlib for plotting, and sklearn for accessing datasets and implementing the linear regression algorithm.
# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
Loading and Understanding the Data
We use the Boston dataset from sklearn to train our model. This dataset contains 13 features for various factors affecting housing prices.
# Preparing the Data
# We load the Boston dataset from sklearn's library
boston = datasets.load_boston()
Next, we explore the structure of the dataset by looking at its keys and the data it contains. This helps us understand what the dataset is composed of.
# Understanding the Data
# Checking the information contained in the dataset
print("Dataset Information")
print(boston.keys())
We can also review the description of the dataset to get a better understanding of its contents, though it's commented out here.
# Checking the dataset description
# print(boston.DESCR)
We then verify the size and shape of the dataset, which tells us how many rows and columns are present in the data.
# Checking the amount of data in the dataset
print("Dataset Shape")
print(boston.data.shape)
Finally, we print the names of the columns, which correspond to the different features in the dataset, such as crime rate, number of rooms, and property tax.
# Checking the column names
print("Column Names:")
print(boston.feature_names)
Nombres de las columnas: ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO' 'B' 'LSTAT']
Preparing the Data for Linear Regression
To perform linear regression, we focus on one feature: the average number of rooms per dwelling (column 5 of the dataset).
# Selecting only the data from column 5 (number of rooms)
X = boston.data[:, np.newaxis, 5]
# Defining the target data (median house value)
y = boston.target
To visualize the data, we plot the number of rooms against the median home value.
# Plotting the data
plt.scatter(X,y)
plt.xlabel("Número de habitaciones")
plt.ylabel("Valor Medio")
plt.show()

Implementing Simple Linear Regression
We now proceed to implement linear regression using the LinearRegression model from sklearn.
Splitting the Data
Before training the model, we split the dataset into training and test sets. This allows us to evaluate the model's performance on unseen data.
# Importing the train_test_split function
from sklearn.model_selection import train_test_split
# Splitting the data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Training the Model
We define and train the linear regression model using the training data.
# Defining the linear regression algorithm
lr = linear_model.LinearRegression()
# Training the model
lr.fit(X_train, y_train)
Making Predictions
After training, we use the model to predict housing prices based on the test data.
# Making predictions
Y_pred = lr.predict(X_test)
We also visualize the predictions by plotting the test data points alongside the regression line.
# Plotting the test data and the model's prediction line
plt.scatter(X_test, y_test)
plt.plot(X_test,Y_pred,color='red',linewidth=3)
plt.title('Regresión Lineal Simple')
plt.xlabel('Número de habitaciones')
plt.ylabel('Valor Medio')
plt.show()

Model Evaluation
Finally, we evaluate the model by extracting key information such as the slope (coefficient), intercept, and the model's accuracy.
Model Coefficients
The coefficient (slope) indicates how much the house price changes for each additional room, while the intercept tells us the price when there are no rooms (which might not be meaningful but helps understand the line's position).
# Displaying the model's coefficients
print("Simple Linear Regression Model Details")
print('Coefficient (Slope):')
print(lr.coef_)
print('Intercept:')
print(lr.intercept_)
The regression equation derived from the model is:
# Displaying the equation of the model
print("The model's equation is:")
print('y = ', lr.coef_, 'x + ', lr.intercept_)
Model Accuracy
To assess the accuracy, we use the R² score, which indicates how well the model fits the training data.
# Displaying the model's accuracy
print('Model Accuracy:')
print(lr.score(X_train, y_train))
In this case, the accuracy is approximately 44.2%, which indicates that while the model explains some of the variability in house prices, there is still room for improvement.
This project demonstrates the basic steps involved in implementing simple linear regression to predict house prices based on the number of rooms. Through this process, we explore how machine learning models can learn patterns from data and make predictions.