Simple Autoencoder

Title: Simple Autoencoder

Author: Antonio Lorenzo

Subject: Machine learning

Language: English

Autoencoder structure

Autoencoders are a type of neural network used to learn efficient representations of data, typically for the purpose of dimensionality reduction. In this project, we aim to build a simple autoencoder that can reconstruct images by compressing them into a latent space representation and then reconstructing them back to the original format.

The architecture of an autoencoder generally consists of two parts:

Encoder: Compresses the input data into a smaller, dense representation.
Decoder: Reconstructs the original data from the compressed representation.

Below is a high-level visualization of the architecture:

Import dependencies

We start by importing the necessary libraries for building and visualizing our autoencoder. TensorFlow and Keras will handle the neural network's operations, while Matplotlib will help in visualizing the results.

import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

Dataset: Fashion MNIST

We begin by testing the autoencoder using the Fashion MNIST dataset, which consists of grayscale images of 10 different clothing categories.

Load and Explore Data

We load the dataset and check the shape and label of the first image to understand its structure. This helps ensure that the data is correctly loaded and ready for processing.

fashion_mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

print('Image shape: ', x_train[0].shape)
print('Label: ', y_train[0])

#Visualice an example from the data
plt.title(f'Label: {y_train[0]} = Ankle boot')
plt.imshow(x_train[0], cmap='gray')

The output of the code provides the shape of the image (28x28 pixels) and its corresponding label. We also visualize one of the samples, which is an Ankle Boot.

Preprocessing

To make the data easier for the model to work with, we normalize the pixel values by scaling them between 0 and 1.

x_train = x_train/255.0
x_test = x_test/255.0

Building the Autoencoder

Encoder

The encoder compresses the input image into a smaller latent space representation. This involves flattening the input and passing it through a dense layer with 64 units, reducing the 784 input pixels (28x28) to just 64 dimensions.

#Parts
encoder_input = keras.Input(shape=(28,28,1), name='Imagen')
x = keras.layers.Flatten()(encoder_input)
encoder_output = keras.layers.Dense(64, activation='relu')(x)
#Enssamble
encoder = keras.Model(encoder_input, encoder_output, name='Encoder')

Decoder

The decoder reconstructs the original image from the 64-dimensional latent space. It reshapes the dense output back to the original 28x28 pixel format.

#Create the encoder
#Parts
decoder_input = keras.layers.Dense(64, activation="relu")(encoder_output)
x = keras.layers.Dense(784, activation="relu")(decoder_input) #784 = 28*28
decoder_output = keras.layers.Reshape((28, 28, 1))(x) #Make the same shape from original image

Compiling the Model

We compile the model with the Adam optimizer, using Mean Squared Error (MSE) as the loss function, which measures the difference between the original and reconstructed images.

#Set optimizer for reduce error between original image and the reconstruction from the latent space
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

#Enssamble
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')

#Compile
autoencoder.compile(optimizer, loss='mse')

autoencoder.summary()

    Model: "autoencoder"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     Imagen (InputLayer)         [(None, 28, 28, 1)]       0         
                                                                     
     flatten (Flatten)           (None, 784)               0         
                                                                     
     dense (Dense)               (None, 64)                50240     
                                                                     
     dense_1 (Dense)             (None, 64)                4160      
                                                                     
     dense_2 (Dense)             (None, 784)               50960     
                                                                     
     reshape (Reshape)           (None, 28, 28, 1)         0         
                                                                     
    =================================================================
    Total params: 105360 (411.56 KB)
    Trainable params: 105360 (411.56 KB)
    Non-trainable params: 0 (0.00 Byte)
    _________________________________________________________________

Training the Autoencoder

We train the autoencoder for 5 epochs using the Fashion MNIST dataset. The model is trained to minimize the loss between the original input and the reconstructed output.

#Train the model
epochs = 5
#The data and target are the same (x_train)*
for epoch in range(epochs):
    history = autoencoder.fit(
      x_train,
      x_train,
      epochs=1,
      batch_size=32, validation_split=0.10
        )
    autoencoder.save(f"models/AE-{epoch+1}.model")

1688/1688 [==============================] - 13s 4ms/step - loss: 0.0316 - val_loss: 0.0245 1688/1688 [==============================] - 10s 6ms/step - loss: 0.0227 - val_loss: 0.0223 1688/1688 [==============================] - 8s 5ms/step - loss: 0.0214 - val_loss: 0.0211 1688/1688 [==============================] - 9s 5ms/step - loss: 0.0204 - val_loss: 0.0203 1688/1688 [==============================] - 6s 4ms/step - loss: 0.0197 - val_loss: 0.0197

Exploring the Latent Space

After training, we pass an image through the encoder to obtain its latent space representation. The latent space is a compressed version of the image, consisting of 64 values.

#Generate a latent space from a image with the encoder
example = encoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
print('Shape of the latent space: ', example[0].shape)

1/1 [==============================] - 0s 31ms/step Shape of the latent space: (64,)

Results

We visualize the original image, the latent space, and the reconstructed image to see how well the autoencoder performed.

#Graphic the latent space (only 64 pixels of data from 784)
#plt.imshow(example[0].reshape((8,8)), cmap="gray") #Latent Space

#Get the original example
#plt.imshow(x_test[0], cmap="gray")

#Reconstruction the original image by the autoencoder
ae_out = autoencoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
img = ae_out[0]  # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
#plt.imshow(ae_out[0], cmap="gray")

1/1 [==============================] - 0s 17ms/step

#Visualice the results
fig = plt.figure(figsize=(10, 7))

#Original image
fig.add_subplot(1, 3, 1)
plt.imshow(x_test[0], cmap="gray")
plt.axis('off')
plt.title("Original Image")

#Original image
fig.add_subplot(1, 3, 2)
plt.imshow(example[0].reshape((8,8)), cmap="gray")
plt.axis('off')
plt.title("Latent space")

#Original image
fig.add_subplot(1, 3, 3)
plt.imshow(ae_out[0], cmap="gray")
plt.axis('off')
plt.title("Reconstruyed image")

Testing with CIFAR 10 Dataset

For a more complex task, we test the autoencoder on the CIFAR-10 dataset, which contains color images from 10 different categories, such as animals and vehicles.

Load and Visualize Data

We load the CIFAR-10 dataset and check the shape of the images, which are now 32x32 pixels with three color channels (RGB).

#Load data
cifar10 = tf.keras.datasets.cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print('Image shape: ', x_train[0].shape)
print('Label: ', y_train[0][0])

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 170498071/170498071 [==============================] - 13s 0us/step

Image shape: (32, 32, 3) Label: 6

#Visualice an example from the data
plt.title(f'Label: {y_train[0][0]} = Frog')
plt.imshow(x_train[0])

#Set the data between 0 and 1
x_train = x_train/255.0
x_test = x_test/255.0

Adjusting the Autoencoder

For CIFAR-10, we modify the encoder and decoder to handle the 32x32 RGB images, increasing the latent space to 256 dimensions.

#Create the encoder
#Parts
encoder_input = keras.Input(shape=(32,32,3), name='Imagen')
x = keras.layers.Flatten()(encoder_input)
encoder_output = keras.layers.Dense(256, activation='relu')(x)
#Enssamble
encoder = keras.Model(encoder_input, encoder_output, name='Encoder')

#Create the encoder
#Parts
decoder_input = keras.layers.Dense(256, activation="relu")(encoder_output)
x = keras.layers.Dense(3072, activation="relu")(decoder_input) #3072 = 32*32*3
decoder_output = keras.layers.Reshape((32, 32, 3))(x) #Make the same shape from original image

#Set optimizer for reduce error between original image and the reconstruction from the latent space
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
#Enssamble
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.summary()
#Compile
autoencoder.compile(optimizer, loss='mse')

    Model: "autoencoder"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     Imagen (InputLayer)         [(None, 32, 32, 3)]       0         
                                                                     
     flatten_1 (Flatten)         (None, 3072)              0         
                                                                     
     dense_3 (Dense)             (None, 256)               786688    
                                                                     
     dense_4 (Dense)             (None, 256)               65792     
                                                                     
     dense_5 (Dense)             (None, 3072)              789504    
                                                                     
     reshape_1 (Reshape)         (None, 32, 32, 3)         0         
                                                                     
    =================================================================
    Total params: 1641984 (6.26 MB)
    Trainable params: 1641984 (6.26 MB)
    Non-trainable params: 0 (0.00 Byte)
    _________________________________________________________________

Train on CIFAR-10

#Train the model
epochs = 15
#The data and target are the same (x_train)*
for epoch in range(epochs):
    history = autoencoder.fit(
      x_train,
      x_train,
      epochs=1,
      batch_size=32, validation_split=0.10
        )
    autoencoder.save(f"models/AE-{epoch+1}.model")

1407/1407 [==============================] - 6s 4ms/step - loss: 0.0098 - val_loss: 0.0096 1407/1407 [==============================] - 6s 4ms/step - loss: 0.0097 - val_loss: 0.0097

Results for CIFAR-10

After training the autoencoder on CIFAR-10, we visualize the original image, its latent space, and the reconstructed image.

#Generate a latent space from a image with the encoder
example = encoder.predict([ x_test[0].reshape(-1, 32, 32, 3) ])
print('Shape of the latent space: ', example[0].shape)

1/1 [==============================] - 0s 37ms/step Shape of the latent space: (256,)

#Graphic the latent space (only 64 pixels of data from 784)
#plt.imshow(example[0].reshape((8,8))) #Latent Space

#Get the original example
#plt.imshow(x_test[0])

#Reconstruction the original image by the autoencoder
ae_out = autoencoder.predict([ x_test[0].reshape(-1, 32, 32, 3) ])
img = ae_out[0]  # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
#plt.imshow(ae_out[0])

1/1 [==============================] - 0s 55ms/step

#Visualice the results
fig = plt.figure(figsize=(10, 7))

#Original image
fig.add_subplot(1, 3, 1)
plt.imshow(x_test[0])
plt.axis('off')
plt.title("Original Image")

#Original image
fig.add_subplot(1, 3, 2)
plt.imshow(example[0].reshape((16,16)))
plt.axis('off')
plt.title("Latent space")

#Original image
fig.add_subplot(1, 3, 3)
plt.imshow(ae_out[0])
plt.axis('off')
plt.title("Reconstruyed image")

Conclusion

By understanding how autoencoders can compress and reconstruct data, we open up possibilities for using latent spaces in various generative AI tasks, such as denoising, anomaly detection, and creative content generation.