This article was published as a part of the Data Science Blogathon
Most people, when starting to learn Data Science and Machine Learning, often get bored if they don’t get a chance to play with some interesting code in some real-life projects where they can work on different stages of the pipeline of the Data Science Project lifecycle.
So, in this article, I have explained 3 Data Science or Machine Learning projects with Code. These projects are suitable for both Data Science Beginners and Practitioners, where they can try to implement these projects and get their hands dirty in the Data Science project implementation.
- Face Image Generation using Deep Convolutional Generative Adversarial Networks (DCGAN) with Pytorch
- Develop and Deploy a Face Mask Detector System with OpenCV, Keras and StreamLit
- Image Denoising Using AutoEncoders (Encoder-Decoder network) and U-Net architecture with Keras
Let’s start with the first one:
Face Image Generation using Deep Convolutional Generative Adversarial Networks (DCGAN) with Pytorch
Figure showing the architecture of Generative Adversarial Network (GAN)
Image 1
In this project, we train a Deep Convolutional Generative Adversarial Network (DCGAN) model on the CelebFaces Attributes (CelebA) dataset with the objective to get a Generator Network that helps us to produces some new images of human faces which looks as real as possible. If you want to download the required dataset, then use this link.
To know about the theory behind the GAN and DCGAN, you can refer to this article
In short terms, we can define the GAN in the following way:
GANs can be understood as a two-player (i.e, Generator and Discriminator) non-cooperative game, where each player wishes to minimize its corresponding cost function.
Mount the Google Drive in Google Colab
from google.colab import drive drive.mount('/content/drive')
In this project, we will be using the CelebFaces Attributes Dataset (CelebA) in which the image has been cropped which eventually results in remove parts of the image that don’t include a face, and after that the process of resizing happened into a size of 64x64x3 dimension NumPy image.
Unzipping the processed-CelebA-small zip
!unzip "/content/drive/MyDrive/processed-celeba-small.zip"
Give the Data directory
data_dir = 'processed_celeba_small/'
Importing Necessary Dependencies or Libraries
import numpy as np import matplotlib.pyplot as plt import pickle as pkl %matplotlib inline
Visualize the CelebA Data
This dataset contains over 200,000 celebrity images with annotations or labels. These images are basically colour images that have 3 colour channels (RGB) each. In x and y dimensions, images should be square Tensor of size (image_size x image_size).
!pip install torch torchvision
Import Necessary Modules of Pytorch
import torch from torchvision import datasets from torchvision import transforms
Batch neural network using data loader
Now, to access the images in batches we will create a DataLoader.
def get_dataloader(batch_size, image_size, data_dir='processed_celeba_small/'): transform = transforms.Compose([transforms.Resize(image_size),transforms.ToTensor()]) image_dataset = datasets.ImageFolder(data_dir, transform = transform) return torch.utils.data.DataLoader(image_dataset, batch_size = batch_size, shuffle=True)
DataLoader Hyperparameters
- You can choose any reasonable batch_size parameter based on your own.
- However, your image_size must be 32. When we resize the data, what happens is that the smaller size image (less number of pixels) leads to faster training of the model, while still creating convincing images of faces.
batch_size = 64 # hyperparameter img_size = 32 # dataloader with batch_size and img_size celeba_train_loader = get_dataloader(batch_size, img_size)
Converting the Tensor Images into NumPy type and then transposing the dimension to display the Image
def imshow(img): npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) dataiter = iter(celeba_train_loader) images, _ = dataiter.next() # Plotting the images from a batch fig = plt.figure(figsize=(20,4)) plot_size=20 for idx in np.arange(plot_size): ax = fig.add_subplot(2, plot_size/2, idx+1, xticks=[], yticks=[]) imshow(images[idx])
Output:
Scaling the image to a range of -1 to 1 (Assumption – input x is scaled from 0-1)
Now before beginning with the model definition, we will write a function to scale the image data to a pixel range of -1 to 1 which we will use while training. We do this because the output of a hyperbolic tangent activated generator will contain pixel values in a range from -1 to 1, and we need to rescale our training images in the range of [-1,1] as right now, they are in the range 0–1.
def scale(x, feature_range=(-1, 1)): min , max = feature_range x = x * (max - min) + min return x
img = images[0] scaled_img = scale(img) print('Min: ', scaled_img.min()) # check the range of the scaled img to be around -1 to 1 print('Max: ', scaled_img.max())
Defining the Model
A GAN is composed of two adversarial networks, a discriminator and a generator respectively.
Discriminator
The discriminator is a convolutional classifier without max-pooling layers. The inputs to the discriminator are 32x32x3 tensor images and the output results in a single value indicating the image to be real or fake.
import torch.nn as nn import torch.nn.functional as F
def conv (in_channels, out_channels, kernel_size, stride=2, padding=1, batch_norm = True): layers =[] layers.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=False)) if (batch_norm): layers.append(nn.BatchNorm2d(out_channels)) return nn.Sequential(*layers)
class Discriminator(nn.Module): def __init__(self, conv_dim): """ conv_dim - Depth of first convolutional layer """ super(Discriminator, self).__init__() self.conv_dim =conv_dim self.conv1 = conv (3, conv_dim, 4, batch_norm= False) # 3 conv layer followed by fully-connected layer self.conv2 = conv (conv_dim, conv_dim*2, 4) self.conv3 = conv (conv_dim*2, conv_dim*4, 4) self.fc = nn.Linear(conv_dim*4*4*4, 1) def forward(self, x): """ x - The input to the neural network returns the discriminator logits(output) """ y = F.leaky_relu(self.conv1(x), 0.2) y = F.leaky_relu(self.conv2(y), 0.2) y = F.leaky_relu(self.conv3(y), 0.2) out = y.view(-1, self.conv_dim*4*4*4) # flattening out = self.fc(out) # output layer return out
Generator
This component of the GAN learns how to create fake data by including feedback from the discriminator and help the discriminator to classify the real output. This component of the network helps us to upsample the inputs and generate a new image of the same size as our training data 32x32x3. The inputs are vectors of some length z_size while the output is an image of shape 32x32x3.
def deconv (in_channels, out_channels, kernel_size, stride=2, padding=1, batch_norm = True): layers =[] layers.append(nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=False)) if(batch_norm): layers.append(nn.BatchNorm2d(out_channels)) return nn.Sequential(*layers)
class Generator(nn.Module): def __init__(self, z_size, conv_dim): """ z_size - Length of the input latent vector z conv_dim - Depth of the input to the lats transpose conv layer """ super(Generator, self).__init__() self.conv_dim = conv_dim self.fc2 = nn.Linear(z_size, conv_dim*4*4*4) self.t_conv1 = deconv (conv_dim*4, conv_dim*2, 4) self.t_conv2 = deconv (conv_dim*2, conv_dim, 4) self.t_conv3 = deconv (conv_dim, 3, 4, batch_norm= False) def forward(self, x): """ x - input output - 32x32x3 tensor image """ y = self.fc2(x) y = y.view(-1,self.conv_dim*4, 4,4) z = F.relu(self.t_conv1(y)) z = F.relu(self.t_conv2(z)) z = torch.tanh(self.t_conv3(z)) return z
Weight Initialization
To help the models converge asap, we initialized the weights of the convolutional and linear layers in the model based on the original DCGAN paper, which says – “All weights are initialized from a zero-centred Normal distribution with a standard deviation of 0.02”
def weights_init_normal(m): """ Weights are obtained from N(0,0.02) distribution m: layer """ classname = m.__class__.__name__ if hasattr(m, 'weight') and classname.find('Conv') or classname.find('Linear') != -1: m.weight.data.normal_(0.0, 0.02) m.bias.data.fill_(0)
Building the complete network
To build our network, we will define the model hyperparameters and instantiate the discriminator and generator from the classes defined in the Defining Model section.
# instantiate the discriminator and generator def complete_network(d_conv_dim, g_conv_dim, z_size): D = Discriminator(d_conv_dim) G = Generator(z_size=z_size, conv_dim=g_conv_dim) D.apply(weights_init_normal) # initialize the model weights G.apply(weights_init_normal) print(D) print(G) return D, G
# model hyperparameter d_conv_dim = 64 g_conv_dim = 64 z_size = 100 D, G = complete_network(d_conv_dim, g_conv_dim, z_size)
import torch train_on_gpu = torch.cuda.is_available() if not train_on_gpu: # to ensure the training on GPU if available print('No GPU') else: print('GPU Available.Training...')
Loss Calculation – Discriminator and Generative Loss
Discriminator – Total Loss = loss_real-image + loss_fake-image For Discriminator, the output is 1 for real image and 0 for fake image. Generator Loss ensures that the discriminator produces a real image.
def real_loss(D_out): ''' D_out - discriminator logits output real loss ''' batch_size = D_out.size(0) labels = torch.ones(batch_size)*0.9 # one sided label smoothing if train_on_gpu: labels = labels.cuda() criterion = nn.BCEWithLogitsLoss() # binary-cross entropy with logits loss loss = criterion(D_out.squeeze(), labels) # loss calculation return loss def fake_loss(D_out): ''' D_out: discriminator logits output - fake loss ''' batch_size = D_out.size(0) labels = torch.zeros(batch_size) # fake labels = 0 if train_on_gpu: labels = labels.cuda() criterion = nn.BCEWithLogitsLoss() loss = criterion(D_out.squeeze(), labels) # loss calculation return loss
import torch.optim as optim # params lr_d = 0.0002 lr_g = 0.0002 beta1= 0.5 beta2=0.999 #default # Using Adam Optimizers d_optimizer = optim.Adam(D.parameters(), lr_d, [beta1, beta2]) g_optimizer = optim.Adam(G.parameters(), lr_g, [beta1, beta2])
Model Training
During training, we alternate b/w discriminator and generator. Here we to use the real_loss and fake_loss functions to compute the losses for Discriminator and Generator.
- Firstly, train the discriminator by alternating between real and fake images of human faces
- Then the generator component tries to trick the discriminator and should have an opposing loss function
def train(D, G, n_epochs, print_every=50): ''' D - the discriminator network G - the generator network n_epochs - number of epochs print_every - interval to print and record the models losses output - D and G loss ''' if train_on_gpu: D.cuda() G.cuda() # loss and generated "fake" sample losses = [] samples = [] # data for sampling is fixed -they are constant throughout training # Also help to inspect performance of model sample_size=16 fixed_z = np.random.uniform(-1, 1, size=(sample_size, z_size)) fixed_z = torch.from_numpy(fixed_z).float() if train_on_gpu: fixed_z = fixed_z.cuda() for epoch in range(n_epochs): #epoch for batch_i, (real_images, _) in enumerate(celeba_train_loader): # batch train loop batch_size = real_images.size(0) real_images = scale(real_images) if train_on_gpu: real_images = real_images.cuda() d_optimizer.zero_grad() if train_on_gpu: real_images = real_images.cuda() out_real = D(real_images) d_loss_real = real_loss(out_real) z = np.random.uniform(-1, 1, size=(batch_size, z_size)) z = torch.from_numpy(z).float() if train_on_gpu: z = z.cuda() fake_out =G(z) out_fake = D(fake_out) d_loss_fake = fake_loss(out_fake) d_loss = d_loss_real + d_loss_fake d_loss.backward() d_optimizer.step() g_optimizer.zero_grad() z = np.random.uniform(-1, 1, size=(batch_size, z_size)) z = torch.from_numpy(z).float() if train_on_gpu: z = z.cuda() fake_out_g = G(z) G_D_out = D(fake_out_g) g_loss = real_loss(G_D_out) g_loss.backward() g_optimizer.step() if (batch_i % print_every == 0): losses.append((d_loss.item(), g_loss.item())) # append D and G loss # print the stats print('Epoch [{:5d}/{:5d}] | d_loss: {:6.4f} | g_loss: {:6.4f}'.format(epoch+1, n_epochs, d_loss.item(), g_loss.item())) G.eval() # generate samples samples_z = G(fixed_z) samples.append(samples_z) G.train() with open('train_samples.pkl', 'wb') as f: #pkl file pkl.dump(samples, f) return losses
n_epochs = 30 # number of epoch losses = train(D, G, n_epochs=n_epochs) #Training
Plotting the discriminator and generator loss after each epoch
fig, ax = plt.subplots() losses = np.array(losses) plt.plot(losses.T[0], label='Discriminator') plt.plot(losses.T[1], label='Generator') plt.title("Train Loss") plt.legend()
Output:
Generate Sample from Training
# Viewing list of passed from samples def view_samples(epoch, samples): fig, axes = plt.subplots(figsize=(20,4), nrows=2, ncols=8, sharex=True,sharey=True) for ax, img in zip(axes.flatten(), samples[epoch]): img = img.detach().cpu().numpy() img = np.transpose(img, (1, 2, 0)) img = ((img + 1)*255 / (2)).astype(np.uint8) ax.xaxis.set_visible(False) ax.yaxis.set_visible(False) im = ax.imshow(img.reshape((32,32,3)))
with open('train_samples.pkl', 'rb') as f: samples = pkl.load(f) v_s = view_samples(-1, samples)
Output:
Conclusion
By seeing the output, you can observe that our model was able to generate new images of fake human faces that look as realistic as possible. Also, all images are lighter in shade, even the brown faces are a bit lighter. This is because the CelebA dataset is somewhat biased as it consists of “celebrity” faces that are mostly white. Finally, our DCGAN model successfully produces nearly real images from mere noise.
You can also check my Github repo regarding this project also.
Let’s move to the second project: 👇
Develop and Deploy a Face Mask Detector System with OpenCV, Keras and StreamLit
Abstract: In the COVID-19 crisis as we have seen that wearing masks is absolutely necessary for public health and controlling the spread of the pandemic. As a Machine Learning Student, what if we made a system that could monitor whether people around us are complying with these safety measures or not? So, in this project, we will try to make a face mask detector system that detects whether a person is wearing a mask or not and we will also deploy that model in the form of a web app so that we can use that in production also.
Image 2
Model Architecture and Training
In this project, I have made use of Transfer Learning, which is a very simple task. Here I used the MobileNetV2 model to build my classifier network.
By using Transfer Learning I am making use of the feature detection capabilities of the pre-trained MobileNetV2 and applying it to our rather simple model. The MobileNetV2 is followed by our DNN consists of layers such as GlobalAveragePooling, Dense, and Dropout. As ours is a binary classification problem, the final layer has 2 neurons and softmax activation.
Also, I follow the general idea of using Adam optimizer along with Categorical_crossentropy loss to works well as this combination of optimizer and loss function converge on the most optimum weights for my network.
Import the necessary packages
from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.applications import MobileNetV2 from tensorflow.keras.layers import AveragePooling2D from tensorflow.keras.layers import Dropout from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Input from tensorflow.keras.models import Model from tensorflow.keras.optimizers import Adam from tensorflow.keras.applications.mobilenet_v2 import preprocess_input from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.preprocessing.image import load_img from tensorflow.keras.utils import to_categorical from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from imutils import paths import matplotlib.pyplot as plt import numpy as np import argparse import os
Formed the argument parser and parse the arguments
ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset") ap.add_argument("-p", "--plot", type=str, default="plot.png", help="path to output loss/accuracy plot") ap.add_argument("-m", "--model", type=str, default="mask_detector.model", help="path to output face mask detector model") args = vars(ap.parse_args())
Initialized the values of Learning Rate, Number of Epochs, and Batch Size
INIT_LR = 1e-4 EPOCHS = 20 BS = 32
Find the list of images in our dataset directory to initialized the list of data and class images
print("[INFO] loading images...") imagePaths = list(paths.list_images(args["dataset"])) data = [] labels = []
Loop over the image paths
for imagePath in imagePaths: # Fetch the class label from the filename label = imagePath.split(os.path.sep)[-2] # load the input image of dimension 224x224 and preprocess it image = load_img(imagePath, target_size=(224, 224)) image = img_to_array(image) image = preprocess_input(image) # updation happens for the list of data and labels data.append(image) labels.append(label)
Conversion of data and labels into NumPy arrays format
data = np.array(data, dtype="float32") labels = np.array(labels)
Perform one-hot encoding on the labels
lb = LabelBinarizer() labels = lb.fit_transform(labels) labels = to_categorical(labels)
Partition the data into training and testing splits using 75% of the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.20, stratify=labels, random_state=42)
Formed the training image generator for the purpose of Data Augmentation
aug = ImageDataGenerator( rotation_range=20, zoom_range=0.15, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, horizontal_flip=True, fill_mode="nearest")
Load the MobileNetV2 network, ensuring the head Fully Connected layer sets are left off
baseModel = MobileNetV2(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3)))
Make the head of the model for placed on top of the base model
headModel = baseModel.output headModel = AveragePooling2D(pool_size=(7, 7))(headModel) headModel = Flatten(name="flatten")(headModel) headModel = Dense(128, activation="relu")(headModel) headModel = Dropout(0.5)(headModel) headModel = Dense(2, activation="softmax")(headModel)
Place the head Fully Connected model on top of the base model (become actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)
Traverse through all the layers in the base model and freeze them so they wouldn’t be updated during the first training process
for layer in baseModel.layers: layer.trainable = False
Model Compilation
print("[INFO] compiling model...") opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS) model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])
Training the head of the network
print("[INFO] training head...") H = model.fit( aug.flow(trainX, trainY, batch_size=BS), steps_per_epoch=len(trainX) // BS, validation_data=(testX, testY), validation_steps=len(testX) // BS, epochs=EPOCHS)
Make predictions on the testing set
print("[INFO] evaluating network...") predIdxs = model.predict(testX, batch_size=BS)
For each image in the testing set, try to find the index of the label with their corresponding largest predicted value of the probability
predIdxs = np.argmax(predIdxs, axis=1)
Print the Classification Report
print(classification_report(testY.argmax(axis=1), predIdxs, target_names=lb.classes_))
Output:
Serialize the model to disk
print("[INFO] saving mask detector model...") model.save(args["model"], save_format="h5")
Plot the training loss and accuracy
N = EPOCHS plt.style.use("ggplot") plt.figure() plt.plot(np.arange(0, N), H.history["loss"], label="train_loss") plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss") plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc") plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc") plt.title("Training Loss and Accuracy") plt.xlabel("Epoch #") plt.ylabel("Loss/Accuracy") plt.legend(loc="lower left") plt.savefig(args["plot"])
Output:
Output for some Test Image
Now, In this section, I will describe what type of output you are getting after passing an image as input to our face mask detector system.
Conclusion
Hurray, we have successfully 🥳 built our face mask detector system with an accuracy of around 99%. I have also created a web application for this model. You can see that code from my GitHub directly and try to make the app from your side also and use it in the production environment.
You can also check my Github repo regarding this project also.
Let’s move to the third and final project: 👇
Image Denoising Using AutoEncoders (Encoder-Decoder network) and U-Net architecture with Keras
The general autoencoder architecture is shown below:
Figure Showing the architecture of an autoencoder model
Image 3
Before going directly into the code portion, I suggest you first go through this tutorial on autoencoders and then go ahead with this project for a better understanding of both theoretical and practical knowledge.
Import Necessary Dependencies or Libraries
Firstly, we have to import all the necessary python libraries or modules which we are going to use in this implementation.
import numpy as np # Optimizing matrix operations import matplotlib.pyplot as plt # Data Visualization from tensorflow.keras.layers import Conv2D, Input, Dense, Reshape, Conv2DTranspose, Activation, BatchNormalization, ReLU, Concatenate, add, LeakyReLU from tensorflow.keras.models import Model # Functional keras model from tensorflow.keras.callbacks import ModelCheckpoint # To Save the model weights based on the validation error from tensorflow.keras.datasets import cifar100, cifar10 # Required Datasets Used in this problem statement from keras.optimizers import Adam # Optimizer ADAM for optimized the loss function
Load the CIFAR-100 Dataset From Keras Directly
For implementing this we will be using the famous CIFAR-100 dataset as input. For this, we don’t need to download the dataset as we can import it from the Keras library directly.
# Used the CIFAR-100 dataset (train_data_clean, _), (test_data_clean, _) = cifar100.load_data(label_mode='fine')
Normalize our data between 0 and 1
Now, we are scale down our data in the range of [0,1] to reduce the computations.
# To normalize our data, we wil divide all the image pixels by float(255) train_data_clean = train_data_clean.astype('float32') / 255. test_data_clean = test_data_clean.astype('float32') / 255.
Add the Noise to the Input Images
Now, we need to add noise to generate the noisy images. To add noise we can generate an array with the same dimension of our images with random values between [0,1] using a normal distribution with mean = 0 and standard deviation = 1.
To generate normal distribution, we can use np.random.normal(loc,scale,size). Then scale the noise by some factor, here I am using 0.5. After adding noise, pixel values can be out of range, so we need to clip the values using np.clip(arr, arr_min, arr_max ).
# Function to add the noise in our images and clipping its pixel values between 0 and 1 def add_noise_and_clip_data(data, noise_factor): noise = np.random.normal(loc=0.0, scale=0.1, size=data.shape) data = data + noise_factor * noise data = np.clip(data, 0., 1.) return data train_data_noisy = add_noise_and_clip_data(train_data_clean, 0.5) test_data_noisy = add_noise_and_clip_data(test_data_clean, 0.5)
Visualize few training images with their noisy images
Let’s see how our training data looks like along with their corresponding noisy images
rows = 2 # defining no. of rows in figure cols = 8 # defining no. of columns in figure f = plt.figure(figsize=(2*cols,2*rows*2)) # defining a figure for i in range(rows): for j in range(cols): f.add_subplot(rows*2,cols, (2*i*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(train_data_noisy[i*cols + j]) plt.axis("off") for j in range(cols): f.add_subplot(rows*2,cols,((2*i+1)*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(train_data_clean[i*cols + j]) plt.axis("off") f.suptitle("Sample Training Data",fontsize=20) plt.show()
Output:
Define a Simple CNN Architecture
Here we define two functions i.e, one for convolution operations and the other for deconvolution operation to include the encoder and decoder blocks in our customized autoencoder model.
# Function to include the convolution layers in our model architecture def conv_block(x, filters, kernel_size, strides=2): x = Conv2D(filters=filters, kernel_size=kernel_size, strides=strides, padding='same')(x) x = BatchNormalization()(x) x = ReLU()(x) return x # Function to include the de-convolution layers in our model architecture def deconv_block(x, filters, kernel_size): x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, strides=2, padding='same')(x) x = BatchNormalization()(x) x = ReLU()(x) return x
Function to denoise the images are given to the model
Now, we define our main function which is the core of our problem statement since this function denoised our images which we give as the input and this is our main objective of the given problem statement.
def denoising_autoencoder(): den_inputs = Input(shape=(32, 32, 3), name='dae_input') conv_block1 = conv_block(den_inputs, 32, 3) conv_block2 = conv_block(conv_block1, 64, 3) conv_block3 = conv_block(conv_block2, 128, 3) conv_block4 = conv_block(conv_block3, 256, 3) conv_block5 = conv_block(conv_block4, 256, 3, 1) deconv_block1 = deconv_block(conv_block5, 256, 3) merge1 = Concatenate()([deconv_block1, conv_block3]) deconv_block2 = deconv_block(merge1, 128, 3) merge2 = Concatenate()([deconv_block2, conv_block2]) deconv_block3 = deconv_block(merge2, 64, 3) merge3 = Concatenate()([deconv_block3, conv_block1]) deconv_block4 = deconv_block(merge3, 32, 3) final_deconv = Conv2DTranspose(filters=3, kernel_size=3, padding='same')(deconv_block4) den_outputs = Activation('sigmoid', name='dae_output')(final_deconv) return Model(den_inputs, den_outputs, name='dae')
Function Calling, Model Compilation, and Training
In python, after function creation, we have to call this function by creating an object of that function with some specified parameters given to it. Then, after this, we compile and train our model. Here you can choose the hyperparameters such as epochs, batch_size, etc on your own.
dae = denoising_autoencoder() # Function Calling dae.compile(loss='mse', optimizer='adam') # Model Compilation checkpoint = ModelCheckpoint('best_model.h5', verbose=1, save_best_only=True, save_weights_only=True) # Save the best weights # Training or fitting the model dae.fit(train_data_noisy, train_data_clean, validation_data=(test_data_noisy, test_data_clean), epochs=5, batch_size=128, callbacks=[checkpoint])
Save the model weights and predict the denoised images using the above-trained model
Here, we load the best weights which were saved in h5 file and we use those weights to predict our outputs for our testing dataset.
dae.load_weights('best_model.h5') # load the weights which we have saved in our previous section test_data_denoised = dae.predict(test_data_noisy) # Predict the output images using the trained model of best weights
Print the original, noisy, and denoised version of an image
idx = 4 plt.subplot(1,3,1) plt.imshow(test_data_clean[idx]) plt.title('original') plt.subplot(1,3,2) plt.imshow(test_data_noisy[idx]) plt.title('noisy') plt.subplot(1,3,3) plt.imshow(test_data_denoised[idx]) plt.title('denoised') plt.show()
Output:
Evaluate the Model using MSE
Now, we will define a function to calculate the difference between two images. Here we use the mean squared error as the loss function.
def mse(image_1, image_2): return np.square(np.subtract(image_1, image_2)).mean() noisy_clean_mse = mse(test_data_clean, test_data_noisy) # MSE between initial and noisy test data denoised_clean_mse = mse(test_data_denoised, test_data_clean) # MSE between noisy and cleaned test data given by model noisy_clean_mse, denoised_clean_mse # Printing both the MSE values
Testing our DAE on the CIFAR10 dataset
Now, after the completion of model training, it’s time to test our model.
Find the loss between the cleaned image and the input to which we want to compare that image
clean_noisy = mse(cifar10_test, cifar10_test_noisy) clean_denoised = mse(cifar10_test, cifar10_test_denoised) clean_noisy, clean_denoised print("The difference between the two images is:", clean_noisy-clean_denoised)
Now, Let’s design an Encoder-Decoder Network with Skip Connections (i.e, U-Net Architecture)
Skip connections play a very important role while we are working with any network where both convolutions and deconvolution operations are performed. It helps in restoring the pieces of information which can be lost during convolution and deconvolution.
The U-Net architecture is shown below:
Figure showing the U-Net architecture
Image4
Now, let’s start with the code portion:
size = 32 channel = 3 from keras.layers import Conv2D, Input, Dense, Dropout, MaxPool2D, UpSampling2D # Encoder Component of our autoencoder network inputs = Input(shape=(size,size,channel)) x = Conv2D(32, 3, activation='relu', padding='same')(inputs) x = BatchNormalization()(x) x = MaxPool2D()(x) x = Dropout(0.5)(x) skip = Conv2D(32, 3, padding='same')(x) # skip connection for decoder x = LeakyReLU()(skip) x = BatchNormalization()(x) x = MaxPool2D()(x) x = Dropout(0.5)(x) x = Conv2D(64, 3, activation='relu', padding='same')(x) x = BatchNormalization()(x) encoded = MaxPool2D()(x) # Decoder Component of our autoencoder network x = Conv2DTranspose(64, 3,activation='relu',strides=(2,2), padding='same')(encoded) x = BatchNormalization()(x) x = Dropout(0.5)(x) x = Conv2DTranspose(32, 3, activation='relu',strides=(2,2), padding='same')(x) x = BatchNormalization()(x) x = Dropout(0.5)(x) x = Conv2DTranspose(32, 3, padding='same')(x) x = add([x,skip]) # adding skip connection x = LeakyReLU()(x) x = BatchNormalization()(x) decoded = Conv2DTranspose(3, 3, activation='sigmoid',strides=(2,2), padding='same')(x) autoencoder = Model(inputs, decoded) # Compiling the model autoencoder.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy') # Analyzing model summary autoencoder.summary()
Training or Fitting the Model
# Fix the number of epochs and batch_size as hyperparameter epochs = 25 batch_size = 256 history = autoencoder.fit(train_data_noisy, train_data_clean, epochs=epochs, batch_size=batch_size, shuffle=True, validation_data=(test_data_noisy, test_data_clean) )
Drawing the Loss vs Number of Epochs Curve
# Defining Figure f = plt.figure(figsize=(10,7)) f.add_subplot() # Adding Subplots plt.plot(history.epoch, history.history['loss'], label = "loss") # Loss curve for training set plt.plot(history.epoch, history.history['val_loss'], label = "val_loss") # Loss curve for validation set plt.title("Loss Curve",fontsize=18) plt.xlabel("Epochs",fontsize=15) plt.ylabel("Loss",fontsize=15) plt.grid(alpha=0.3) plt.legend() plt.savefig("Loss_curve_cifar10.png") plt.show()
Select few random test images
# Number of images to be selected num_imgs = 48 rand = np.random.randint(1, test_data_noisy.shape[0]-48) cifar_test_images = test_data_noisy[rand:rand+num_imgs] # slicing cifar_test_denoised = autoencoder.predict(test_data_clean) # predict
Visualize test images with their denoised images
rows = 4 # defining no. of rows in figure cols = 12 # defining no. of columns in figure cell_size = 1.5 f = plt.figure(figsize=(cell_size*cols,cell_size*rows*2)) # defining a figure f.tight_layout() for i in range(rows): for j in range(cols): f.add_subplot(rows*2,cols, (2*i*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(test_data_clean[i*cols + j]) plt.axis("off") for j in range(cols): f.add_subplot(rows*2,cols,((2*i+1)*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(test_data_noisy[i*cols + j]) plt.axis("off") f.suptitle("Autoencoder Results - Cifar10",fontsize=18) plt.show()
Now, we will define a function to calculate the difference between two images. Here we use the mean squared error as the loss function
def mse(data_1, data_2): return np.square(np.subtract(data_1, data_2)).mean() noisy_clean_mse = mse(test_data_clean, test_data_noisy) denoised_clean_mse = mse(test_data_denoised, test_data_clean) noisy_clean_mse, denoised_clean_mse
Predict the denoised version of an image
cifar10_test_denoised = autoencoder.predict(cifar10_test_noisy)
Print the original, noisy, and denoised version of an image
idx = 6 plt.subplot(1,3,1) plt.imshow(cifar10_test[idx]) plt.title('original') plt.subplot(1,3,2) plt.imshow(cifar10_test_noisy[idx]) plt.title('noisy') plt.subplot(1,3,3) plt.imshow(cifar10_test_denoised[idx]) plt.title('denoised') plt.show()
Find the MSE between the images
clean_noisy = mse(cifar10_test, cifar10_test_noisy) clean_denoised = mse(cifar10_test, cifar10_test_denoised) clean_noisy, clean_denoised print("The difference between the two images is:", clean_noisy-clean_denoised)
That’s all for now! You can build your autoencoders 😎😎. Explore more datasets and have fun training your own autoencoders.
Conclusion
Autoencoders are powerful and can do a lot more. Here I introduced you to 2 simple models examples, and you can see how well our model performed on the denoising task. There are other uses as well, such as using an autoencoder for sequential data. One of such example is Variational autoencoder (VAE), which is a slightly more advanced and modern concept. It can also be used to generate images.
You can also check my Github repo regarding this project also.
This completes our discussion on all three projects! 🥳
Special Thanks!
For this article, I will be giving special thanks to Vetrivel_PS who regularly motivates me to write such types of articles since these types of articles are very helpful to all of the people who want to either make a transition into Data Science or wants to excel in the field of Data Science.
If you want to also join Vetrivel’s Data Science Community, you can go to the Hackweekly Linkedin Community (https://www.linkedin.com/company/thehackweekly) and join it.
About the Author
You can also check my previous blog posts.
Previous Data Science Blog posts.
Here is my Linkedin profile in case you want to connect with me. I’ll be happy to be connected with you.
For any queries, you can mail me on Gmail.
References :
Image 1 : https://towardsdatascience.com/fake-face-generator-using-dcgan-model-ae9322ccfd65
Image 2: https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.leewayhertz.com%2Fface-mask-detection-system%2F&psig=AOvVaw02aHbLLICPG-G31GnWPnwF&ust=1629113361404000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCKivvrH2svICFQAAAAAdAAAAABAD
Image 3: https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.analyticsvidhya.com%2Fblog%2F2021%2F01%2Fauto-encoders-for-computer-vision-an-endless-world-of-possibilities%2F&psig=AOvVaw1LmmWrE4OS1_Cjr6QO9pmg&ust=1628144426721000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCODNwvHclvICFQAAAAAdAAAAABAI
Image 4 : https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.researchgate.net%2Ffigure%2FThe-architecture-of-Unet_fig2_334287825&psig=AOvVaw2DB1XzXKEZAAf_mzzSwhlC&ust=1628144257443000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCOi1zaHclvICFQAAAAAdAAAAABAD
End Notes
Thanks for reading!
I hope that you have enjoyed the article. If you like it, share it with your friends also. Something not mentioned or want to share your thoughts? Feel free to comment below And I’ll get back to you. 😉
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.