Zephyrnet Logo

Three Computer Vision Projects to Skyrocket your Data Science Career!

Date:

This article was published as a part of the Data Science Blogathon

Most people, when starting to learn Data Science and Machine Learning, often get bored if they don’t get a chance to play with some interesting code in some real-life projects where they can work on different stages of the pipeline of the Data Science Project lifecycle.

So, in this article, I have explained 3 Data Science or Machine Learning projects with Code. These projects are suitable for both Data Science Beginners and Practitioners, where they can try to implement these projects and get their hands dirty in the Data Science project implementation.

  • Face Image Generation using Deep Convolutional Generative Adversarial Networks (DCGAN) with Pytorch
  • Develop and Deploy a Face Mask Detector System with OpenCV, Keras and StreamLit
  • Image Denoising Using AutoEncoders (Encoder-Decoder network) and U-Net architecture with Keras

Let’s start with the first one:

Face Image Generation using Deep Convolutional Generative Adversarial Networks (DCGAN) with Pytorch

GAN

Figure showing the architecture of Generative Adversarial Network (GAN)

Image 1 

In this project, we train a Deep Convolutional Generative Adversarial Network (DCGAN) model on the CelebFaces Attributes (CelebA) dataset with the objective to get a Generator Network that helps us to produces some new images of human faces which looks as real as possible. If you want to download the required dataset, then use this link.

To know about the theory behind the GAN and DCGAN, you can refer to this article

In short terms, we can define the GAN in the following way:

GANs can be understood as a two-player (i.e, Generator and Discriminator) non-cooperative game, where each player wishes to minimize its corresponding cost function.

Mount the Google Drive in Google Colab

from google.colab import drive
drive.mount('/content/drive')

In this project, we will be using the CelebFaces Attributes Dataset (CelebA) in which the image has been cropped which eventually results in remove parts of the image that don’t include a face, and after that the process of resizing happened into a size of 64x64x3 dimension NumPy image.

Unzipping the processed-CelebA-small zip

!unzip "/content/drive/MyDrive/processed-celeba-small.zip"

Give the Data directory

data_dir = 'processed_celeba_small/'

Importing Necessary Dependencies or Libraries

import numpy as np
import matplotlib.pyplot as plt
import pickle as pkl
%matplotlib inline

Visualize the CelebA Data

This dataset contains over 200,000 celebrity images with annotations or labels. These images are basically colour images that have 3 colour channels (RGB) each. In x and y dimensions, images should be square Tensor of size (image_size x image_size).

!pip install torch torchvision

Import Necessary Modules of Pytorch

import torch
from torchvision import datasets
from torchvision import transforms

Batch neural network using data loader

Now, to access the images in batches we will create a DataLoader.

def get_dataloader(batch_size, image_size, data_dir='processed_celeba_small/'): transform = transforms.Compose([transforms.Resize(image_size),transforms.ToTensor()]) image_dataset = datasets.ImageFolder(data_dir, transform = transform) return torch.utils.data.DataLoader(image_dataset, batch_size = batch_size, shuffle=True)

DataLoader Hyperparameters

  • You can choose any reasonable batch_size parameter based on your own.
  • However, your image_size must be 32. When we resize the data, what happens is that the smaller size image (less number of pixels) leads to faster training of the model, while still creating convincing images of faces.
batch_size = 64 # hyperparameter
img_size = 32
# dataloader with batch_size and img_size
celeba_train_loader = get_dataloader(batch_size, img_size)

Converting the Tensor Images into NumPy type and then transposing the dimension to display the Image

def imshow(img): npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0)))
dataiter = iter(celeba_train_loader)
images, _ = dataiter.next()
# Plotting the images from a batch
fig = plt.figure(figsize=(20,4))
plot_size=20
for idx in np.arange(plot_size): ax = fig.add_subplot(2, plot_size/2, idx+1, xticks=[], yticks=[]) imshow(images[idx])

Output:

output | Computer Vision Projects

Scaling the image to a range of -1 to 1 (Assumption – input x is scaled from 0-1)

Now before beginning with the model definition, we will write a function to scale the image data to a pixel range of -1 to 1 which we will use while training. We do this because the output of a hyperbolic tangent activated generator will contain pixel values in a range from -1 to 1, and we need to rescale our training images in the range of [-1,1] as right now, they are in the range 0–1.

def scale(x, feature_range=(-1, 1)): min , max = feature_range x = x * (max - min) + min return x
img = images[0]
scaled_img = scale(img)
print('Min: ', scaled_img.min()) # check the range of the scaled img to be around -1 to 1
print('Max: ', scaled_img.max())

Defining the Model

A GAN is composed of two adversarial networks, a discriminator and a generator respectively.

Discriminator

The discriminator is a convolutional classifier without max-pooling layers. The inputs to the discriminator are 32x32x3 tensor images and the output results in a single value indicating the image to be real or fake.

import torch.nn as nn
import torch.nn.functional as F
def conv (in_channels, out_channels, kernel_size, stride=2, padding=1, batch_norm = True): layers =[] layers.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=False)) if (batch_norm): layers.append(nn.BatchNorm2d(out_channels)) return nn.Sequential(*layers)
class Discriminator(nn.Module): def __init__(self, conv_dim): """ conv_dim - Depth of first convolutional layer """ super(Discriminator, self).__init__() self.conv_dim =conv_dim self.conv1 = conv (3, conv_dim, 4, batch_norm= False) # 3 conv layer followed by fully-connected layer self.conv2 = conv (conv_dim, conv_dim*2, 4) self.conv3 = conv (conv_dim*2, conv_dim*4, 4) self.fc = nn.Linear(conv_dim*4*4*4, 1) def forward(self, x): """ x - The input to the neural network returns the discriminator logits(output) """ y = F.leaky_relu(self.conv1(x), 0.2) y = F.leaky_relu(self.conv2(y), 0.2) y = F.leaky_relu(self.conv3(y), 0.2) out = y.view(-1, self.conv_dim*4*4*4) # flattening out = self.fc(out) # output layer return out

Generator

This component of the GAN learns how to create fake data by including feedback from the discriminator and help the discriminator to classify the real output. This component of the network helps us to upsample the inputs and generate a new image of the same size as our training data 32x32x3. The inputs are vectors of some length z_size while the output is an image of shape 32x32x3.

def deconv (in_channels, out_channels, kernel_size, stride=2, padding=1, batch_norm = True): layers =[] layers.append(nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=False)) if(batch_norm): layers.append(nn.BatchNorm2d(out_channels)) return nn.Sequential(*layers)
class Generator(nn.Module): def __init__(self, z_size, conv_dim): """ z_size - Length of the input latent vector z conv_dim - Depth of the input to the lats transpose conv layer """ super(Generator, self).__init__() self.conv_dim = conv_dim self.fc2 = nn.Linear(z_size, conv_dim*4*4*4) self.t_conv1 = deconv (conv_dim*4, conv_dim*2, 4) self.t_conv2 = deconv (conv_dim*2, conv_dim, 4) self.t_conv3 = deconv (conv_dim, 3, 4, batch_norm= False) def forward(self, x): """ x - input output - 32x32x3 tensor image """ y = self.fc2(x) y = y.view(-1,self.conv_dim*4, 4,4) z = F.relu(self.t_conv1(y)) z = F.relu(self.t_conv2(z)) z = torch.tanh(self.t_conv3(z)) return z

Weight Initialization

To help the models converge asap, we initialized the weights of the convolutional and linear layers in the model based on the original DCGAN paper, which says – “All weights are initialized from a zero-centred Normal distribution with a standard deviation of 0.02”

def weights_init_normal(m): """ Weights are obtained from N(0,0.02) distribution m: layer """ classname = m.__class__.__name__ if hasattr(m, 'weight') and classname.find('Conv') or classname.find('Linear') != -1: m.weight.data.normal_(0.0, 0.02) m.bias.data.fill_(0)

Building the complete network

To build our network, we will define the model hyperparameters and instantiate the discriminator and generator from the classes defined in the Defining Model section.

# instantiate the discriminator and generator
def complete_network(d_conv_dim, g_conv_dim, z_size): D = Discriminator(d_conv_dim) G = Generator(z_size=z_size, conv_dim=g_conv_dim) D.apply(weights_init_normal) # initialize the model weights G.apply(weights_init_normal) print(D) print(G) return D, G

# model hyperparameter
d_conv_dim = 64
g_conv_dim = 64
z_size = 100
D, G = complete_network(d_conv_dim, g_conv_dim, z_size)
import torch
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu: # to ensure the training on GPU if available print('No GPU')
else: print('GPU Available.Training...')

Loss Calculation – Discriminator and Generative Loss

Discriminator – Total Loss = loss_real-image + loss_fake-image For Discriminator, the output is 1 for real image and 0 for fake image. Generator Loss ensures that the discriminator produces a real image.

def real_loss(D_out): ''' D_out - discriminator logits output real loss ''' batch_size = D_out.size(0) labels = torch.ones(batch_size)*0.9 # one sided label smoothing if train_on_gpu: labels = labels.cuda() criterion = nn.BCEWithLogitsLoss() # binary-cross entropy with logits loss loss = criterion(D_out.squeeze(), labels) # loss calculation return loss
def fake_loss(D_out): ''' D_out: discriminator logits output - fake loss ''' batch_size = D_out.size(0) labels = torch.zeros(batch_size) # fake labels = 0 if train_on_gpu: labels = labels.cuda() criterion = nn.BCEWithLogitsLoss() loss = criterion(D_out.squeeze(), labels) # loss calculation return loss

import torch.optim as optim
# params
lr_d = 0.0002
lr_g = 0.0002
beta1= 0.5
beta2=0.999 #default
# Using Adam Optimizers
d_optimizer = optim.Adam(D.parameters(), lr_d, [beta1, beta2])
g_optimizer = optim.Adam(G.parameters(), lr_g, [beta1, beta2])

Model Training

During training, we alternate b/w discriminator and generator. Here we to use the real_loss and fake_loss functions to compute the losses for Discriminator and Generator.

  • Firstly, train the discriminator by alternating between real and fake images of human faces
  • Then the generator component tries to trick the discriminator and should have an opposing loss function
def train(D, G, n_epochs, print_every=50): ''' D - the discriminator network G - the generator network n_epochs - number of epochs print_every - interval to print and record the models losses output - D and G loss ''' if train_on_gpu: D.cuda() G.cuda() # loss and generated "fake" sample losses = [] samples = [] # data for sampling is fixed -they are constant throughout training # Also help to inspect performance of model sample_size=16 fixed_z = np.random.uniform(-1, 1, size=(sample_size, z_size)) fixed_z = torch.from_numpy(fixed_z).float() if train_on_gpu: fixed_z = fixed_z.cuda() for epoch in range(n_epochs): #epoch for batch_i, (real_images, _) in enumerate(celeba_train_loader): # batch train loop batch_size = real_images.size(0) real_images = scale(real_images) if train_on_gpu: real_images = real_images.cuda() d_optimizer.zero_grad() if train_on_gpu: real_images = real_images.cuda() out_real = D(real_images) d_loss_real = real_loss(out_real) z = np.random.uniform(-1, 1, size=(batch_size, z_size)) z = torch.from_numpy(z).float() if train_on_gpu: z = z.cuda() fake_out =G(z) out_fake = D(fake_out) d_loss_fake = fake_loss(out_fake) d_loss = d_loss_real + d_loss_fake d_loss.backward() d_optimizer.step() g_optimizer.zero_grad() z = np.random.uniform(-1, 1, size=(batch_size, z_size)) z = torch.from_numpy(z).float() if train_on_gpu: z = z.cuda() fake_out_g = G(z) G_D_out = D(fake_out_g) g_loss = real_loss(G_D_out) g_loss.backward() g_optimizer.step() if (batch_i % print_every == 0): losses.append((d_loss.item(), g_loss.item())) # append D and G loss # print the stats print('Epoch [{:5d}/{:5d}] | d_loss: {:6.4f} | g_loss: {:6.4f}'.format(epoch+1, n_epochs, d_loss.item(), g_loss.item())) G.eval() # generate samples samples_z = G(fixed_z) samples.append(samples_z) G.train() with open('train_samples.pkl', 'wb') as f: #pkl file pkl.dump(samples, f) return losses

n_epochs = 30 # number of epoch
losses = train(D, G, n_epochs=n_epochs) #Training

Plotting the discriminator and generator loss after each epoch

fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0], label='Discriminator')
plt.plot(losses.T[1], label='Generator')
plt.title("Train Loss")
plt.legend()

Output:

Loss Plot | Computer Vision Projects

Generate Sample from Training

# Viewing list of passed from samples
def view_samples(epoch, samples): fig, axes = plt.subplots(figsize=(20,4), nrows=2, ncols=8, sharex=True,sharey=True) for ax, img in zip(axes.flatten(), samples[epoch]): img = img.detach().cpu().numpy() img = np.transpose(img, (1, 2, 0)) img = ((img + 1)*255 / (2)).astype(np.uint8) ax.xaxis.set_visible(False) ax.yaxis.set_visible(False) im = ax.imshow(img.reshape((32,32,3)))
with open('train_samples.pkl', 'rb') as f: samples = pkl.load(f)
v_s = view_samples(-1, samples)

Output:

output | Computer Vision Projects

Conclusion

By seeing the output, you can observe that our model was able to generate new images of fake human faces that look as realistic as possible. Also, all images are lighter in shade, even the brown faces are a bit lighter. This is because the CelebA dataset is somewhat biased as it consists of “celebrity” faces that are mostly white. Finally, our DCGAN model successfully produces nearly real images from mere noise.

You can also check my Github repo regarding this project also.

Let’s move to the second project: 👇

Develop and Deploy a Face Mask Detector System with OpenCV, Keras and StreamLit

Abstract: In the COVID-19 crisis as we have seen that wearing masks is absolutely necessary for public health and controlling the spread of the pandemic. As a Machine Learning Student, what if we made a system that could monitor whether people around us are complying with these safety measures or not? So, in this project, we will try to make a face mask detector system that detects whether a person is wearing a mask or not and we will also deploy that model in the form of a web app so that we can use that in production also.

Face Mask Detection System using AI | Computer Vision Projects

                                                            Image 2 

Model Architecture and Training

In this project, I have made use of Transfer Learning, which is a very simple task. Here I used the MobileNetV2 model to build my classifier network.

By using Transfer Learning I am making use of the feature detection capabilities of the pre-trained MobileNetV2 and applying it to our rather simple model. The MobileNetV2 is followed by our DNN consists of layers such as GlobalAveragePooling, Dense, and Dropout. As ours is a binary classification problem, the final layer has 2 neurons and softmax activation.

Also, I follow the general idea of using Adam optimizer along with Categorical_crossentropy loss to works well as this combination of optimizer and loss function converge on the most optimum weights for my network.

Import the necessary packages

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import os

Formed the argument parser and parse the arguments

ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True, help="path to input dataset")
ap.add_argument("-p", "--plot", type=str, default="plot.png", help="path to output loss/accuracy plot")
ap.add_argument("-m", "--model", type=str, default="mask_detector.model", help="path to output face mask detector model")
args = vars(ap.parse_args())

Initialized the values of Learning Rate, Number of Epochs, and Batch Size

INIT_LR = 1e-4
EPOCHS = 20
BS = 32

Find the list of images in our dataset directory to initialized the list of data and class images

print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []

Loop over the image paths

for imagePath in imagePaths: # Fetch the class label from the filename label = imagePath.split(os.path.sep)[-2] # load the input image of dimension 224x224 and preprocess it image = load_img(imagePath, target_size=(224, 224)) image = img_to_array(image) image = preprocess_input(image) # updation happens for the list of data and labels data.append(image) labels.append(label)

Conversion of data and labels into NumPy arrays format

data = np.array(data, dtype="float32")
labels = np.array(labels)

Perform one-hot encoding on the labels

lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)

Partition the data into training and testing splits using 75% of the data for training and the remaining 25% for testing

(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.20, stratify=labels, random_state=42)

Formed the training image generator for the purpose of Data Augmentation

aug = ImageDataGenerator( rotation_range=20, zoom_range=0.15, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, horizontal_flip=True, fill_mode="nearest")

Load the MobileNetV2 network, ensuring the head Fully Connected layer sets are left off

baseModel = MobileNetV2(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3)))

Make the head of the model for placed on top of the base model

headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

Place the head Fully Connected model on top of the base model (become actual model we will train)

model = Model(inputs=baseModel.input, outputs=headModel)

Traverse through all the layers in the base model and freeze them so they wouldn’t be updated during the first training process

for layer in baseModel.layers: layer.trainable = False

Model Compilation

print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])

Training the head of the network

print("[INFO] training head...")
H = model.fit( aug.flow(trainX, trainY, batch_size=BS), steps_per_epoch=len(trainX) // BS, validation_data=(testX, testY), validation_steps=len(testX) // BS, epochs=EPOCHS)

Make predictions on the testing set

print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=BS)

For each image in the testing set, try to find the index of the label with their corresponding largest predicted value of the probability

predIdxs = np.argmax(predIdxs, axis=1)

Print the Classification Report

print(classification_report(testY.argmax(axis=1), predIdxs, target_names=lb.classes_))

Output:

alt text

Serialize the model to disk

print("[INFO] saving mask detector model...")
model.save(args["model"], save_format="h5")

Plot the training loss and accuracy

N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

Output:

plot.png | Computer Vision Projects

Output for some Test Image

Now, In this section, I will describe what type of output you are getting after passing an image as input to our face mask detector system.

Output

Conclusion

Hurray, we have successfully 🥳 built our face mask detector system with an accuracy of around 99%. I have also created a web application for this model. You can see that code from my GitHub directly and try to make the app from your side also and use it in the production environment.

You can also check my Github repo regarding this project also.

Let’s move to the third and final project: 👇

 Image Denoising Using AutoEncoders (Encoder-Decoder network) and U-Net architecture with Keras

The general autoencoder architecture is shown below:

Auto Encoders For Computer Vision | Computer Vision Projects

Figure Showing the architecture of an autoencoder model

                                             Image 3

Before going directly into the code portion, I suggest you first go through this tutorial on autoencoders and then go ahead with this project for a better understanding of both theoretical and practical knowledge.

Import Necessary Dependencies or Libraries

Firstly, we have to import all the necessary python libraries or modules which we are going to use in this implementation.

import numpy as np # Optimizing matrix operations
import matplotlib.pyplot as plt # Data Visualization
from tensorflow.keras.layers import Conv2D, Input, Dense, Reshape, Conv2DTranspose, Activation, BatchNormalization, ReLU, Concatenate, add, LeakyReLU
from tensorflow.keras.models import Model # Functional keras model
from tensorflow.keras.callbacks import ModelCheckpoint # To Save the model weights based on the validation error
from tensorflow.keras.datasets import cifar100, cifar10 # Required Datasets Used in this problem statement
from keras.optimizers import Adam # Optimizer ADAM for optimized the loss function

Load the CIFAR-100 Dataset From Keras Directly

For implementing this we will be using the famous CIFAR-100 dataset as input. For this, we don’t need to download the dataset as we can import it from the Keras library directly.

# Used the CIFAR-100 dataset
(train_data_clean, _), (test_data_clean, _) = cifar100.load_data(label_mode='fine')

Normalize our data between 0 and 1

Now, we are scale down our data in the range of [0,1] to reduce the computations.

# To normalize our data, we wil divide all the image pixels by float(255)
train_data_clean = train_data_clean.astype('float32') / 255.
test_data_clean = test_data_clean.astype('float32') / 255.

Add the Noise to the Input Images

Now, we need to add noise to generate the noisy images. To add noise we can generate an array with the same dimension of our images with random values between [0,1] using a normal distribution with mean = 0 and standard deviation = 1.

To generate normal distribution, we can use np.random.normal(loc,scale,size). Then scale the noise by some factor, here I am using 0.5. After adding noise, pixel values can be out of range, so we need to clip the values using np.clip(arr, arr_min, arr_max ).

# Function to add the noise in our images and clipping its pixel values between 0 and 1
def add_noise_and_clip_data(data, noise_factor): noise = np.random.normal(loc=0.0, scale=0.1, size=data.shape) data = data + noise_factor * noise data = np.clip(data, 0., 1.) return data
train_data_noisy = add_noise_and_clip_data(train_data_clean, 0.5)
test_data_noisy = add_noise_and_clip_data(test_data_clean, 0.5)

Visualize few training images with their noisy images

Let’s see how our training data looks like along with their corresponding noisy images

rows = 2 # defining no. of rows in figure
cols = 8 # defining no. of columns in figure
f = plt.figure(figsize=(2*cols,2*rows*2)) # defining a figure for i in range(rows): for j in range(cols): f.add_subplot(rows*2,cols, (2*i*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(train_data_noisy[i*cols + j]) plt.axis("off") for j in range(cols): f.add_subplot(rows*2,cols,((2*i+1)*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(train_data_clean[i*cols + j]) plt.axis("off")
f.suptitle("Sample Training Data",fontsize=20)
plt.show()

Output:

sample training data | Computer Vision Projects

Define a Simple CNN Architecture

Here we define two functions i.e, one for convolution operations and the other for deconvolution operation to include the encoder and decoder blocks in our customized autoencoder model.

# Function to include the convolution layers in our model architecture
def conv_block(x, filters, kernel_size, strides=2): x = Conv2D(filters=filters, kernel_size=kernel_size, strides=strides, padding='same')(x) x = BatchNormalization()(x) x = ReLU()(x) return x
# Function to include the de-convolution layers in our model architecture
def deconv_block(x, filters, kernel_size): x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, strides=2, padding='same')(x) x = BatchNormalization()(x) x = ReLU()(x) return x

Function to denoise the images are given to the model

Now, we define our main function which is the core of our problem statement since this function denoised our images which we give as the input and this is our main objective of the given problem statement.

def denoising_autoencoder(): den_inputs = Input(shape=(32, 32, 3), name='dae_input') conv_block1 = conv_block(den_inputs, 32, 3) conv_block2 = conv_block(conv_block1, 64, 3) conv_block3 = conv_block(conv_block2, 128, 3) conv_block4 = conv_block(conv_block3, 256, 3) conv_block5 = conv_block(conv_block4, 256, 3, 1) deconv_block1 = deconv_block(conv_block5, 256, 3) merge1 = Concatenate()([deconv_block1, conv_block3]) deconv_block2 = deconv_block(merge1, 128, 3) merge2 = Concatenate()([deconv_block2, conv_block2]) deconv_block3 = deconv_block(merge2, 64, 3) merge3 = Concatenate()([deconv_block3, conv_block1]) deconv_block4 = deconv_block(merge3, 32, 3) final_deconv = Conv2DTranspose(filters=3, kernel_size=3, padding='same')(deconv_block4) den_outputs = Activation('sigmoid', name='dae_output')(final_deconv) return Model(den_inputs, den_outputs, name='dae')

Function Calling, Model Compilation, and Training

In python, after function creation, we have to call this function by creating an object of that function with some specified parameters given to it. Then, after this, we compile and train our model. Here you can choose the hyperparameters such as epochs, batch_size, etc on your own.

dae = denoising_autoencoder() # Function Calling
dae.compile(loss='mse', optimizer='adam') # Model Compilation
checkpoint = ModelCheckpoint('best_model.h5', verbose=1, save_best_only=True, save_weights_only=True) # Save the best weights
# Training or fitting the model
dae.fit(train_data_noisy, train_data_clean, validation_data=(test_data_noisy, test_data_clean), epochs=5, batch_size=128, callbacks=[checkpoint])

Save the model weights and predict the denoised images using the above-trained model

Here, we load the best weights which were saved in h5 file and we use those weights to predict our outputs for our testing dataset.

dae.load_weights('best_model.h5') # load the weights which we have saved in our previous section
test_data_denoised = dae.predict(test_data_noisy) # Predict the output images using the trained model of best weights

Print the original, noisy, and denoised version of an image

idx = 4
plt.subplot(1,3,1)
plt.imshow(test_data_clean[idx])
plt.title('original')
plt.subplot(1,3,2)
plt.imshow(test_data_noisy[idx])
plt.title('noisy')
plt.subplot(1,3,3)
plt.imshow(test_data_denoised[idx])
plt.title('denoised')
plt.show()

Output:

output | Computer Vision Projects

Evaluate the Model using MSE

Now, we will define a function to calculate the difference between two images. Here we use the mean squared error as the loss function.

def mse(image_1, image_2): return np.square(np.subtract(image_1, image_2)).mean()
noisy_clean_mse = mse(test_data_clean, test_data_noisy) # MSE between initial and noisy test data
denoised_clean_mse = mse(test_data_denoised, test_data_clean) # MSE between noisy and cleaned test data given by model
noisy_clean_mse, denoised_clean_mse # Printing both the MSE values

Testing our DAE on the CIFAR10 dataset

Now, after the completion of model training, it’s time to test our model.

Find the loss between the cleaned image and the input to which we want to compare that image

clean_noisy = mse(cifar10_test, cifar10_test_noisy)
clean_denoised = mse(cifar10_test, cifar10_test_denoised)
clean_noisy, clean_denoised
print("The difference between the two images is:", clean_noisy-clean_denoised)

Now, Let’s design an Encoder-Decoder Network with Skip Connections (i.e, U-Net Architecture)

Skip connections play a very important role while we are working with any network where both convolutions and deconvolution operations are performed. It helps in restoring the pieces of information which can be lost during convolution and deconvolution.

The U-Net architecture is shown below:

U-net

Figure showing the U-Net architecture

                                     Image4 

Now, let’s start with the code portion:

size = 32
channel = 3
from keras.layers import Conv2D, Input, Dense, Dropout, MaxPool2D, UpSampling2D
# Encoder Component of our autoencoder network inputs = Input(shape=(size,size,channel))
x = Conv2D(32, 3, activation='relu', padding='same')(inputs)
x = BatchNormalization()(x)
x = MaxPool2D()(x)
x = Dropout(0.5)(x)
skip = Conv2D(32, 3, padding='same')(x) # skip connection for decoder
x = LeakyReLU()(skip)
x = BatchNormalization()(x)
x = MaxPool2D()(x)
x = Dropout(0.5)(x)
x = Conv2D(64, 3, activation='relu', padding='same')(x)
x = BatchNormalization()(x)
encoded = MaxPool2D()(x)
# Decoder Component of our autoencoder network
x = Conv2DTranspose(64, 3,activation='relu',strides=(2,2), padding='same')(encoded)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv2DTranspose(32, 3, activation='relu',strides=(2,2), padding='same')(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv2DTranspose(32, 3, padding='same')(x)
x = add([x,skip]) # adding skip connection
x = LeakyReLU()(x)
x = BatchNormalization()(x)
decoded = Conv2DTranspose(3, 3, activation='sigmoid',strides=(2,2), padding='same')(x)
autoencoder = Model(inputs, decoded)
# Compiling the model autoencoder.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy')
# Analyzing model summary
autoencoder.summary()

Training or Fitting the Model

# Fix the number of epochs and batch_size as hyperparameter
epochs = 25
batch_size = 256
history = autoencoder.fit(train_data_noisy, train_data_clean, epochs=epochs, batch_size=batch_size, shuffle=True, validation_data=(test_data_noisy, test_data_clean) )

Drawing the Loss vs Number of Epochs Curve

# Defining Figure
f = plt.figure(figsize=(10,7))
f.add_subplot()
# Adding Subplots
plt.plot(history.epoch, history.history['loss'], label = "loss") # Loss curve for training set
plt.plot(history.epoch, history.history['val_loss'], label = "val_loss") # Loss curve for validation set
plt.title("Loss Curve",fontsize=18)
plt.xlabel("Epochs",fontsize=15)
plt.ylabel("Loss",fontsize=15)
plt.grid(alpha=0.3)
plt.legend()
plt.savefig("Loss_curve_cifar10.png")
plt.show()

Select few random test images

# Number of images to be selected
num_imgs = 48
rand = np.random.randint(1, test_data_noisy.shape[0]-48) cifar_test_images = test_data_noisy[rand:rand+num_imgs] # slicing
cifar_test_denoised = autoencoder.predict(test_data_clean) # predict

Visualize test images with their denoised images

rows = 4 # defining no. of rows in figure
cols = 12 # defining no. of columns in figure
cell_size = 1.5
f = plt.figure(figsize=(cell_size*cols,cell_size*rows*2)) # defining a figure f.tight_layout()
for i in range(rows): for j in range(cols): f.add_subplot(rows*2,cols, (2*i*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(test_data_clean[i*cols + j]) plt.axis("off") for j in range(cols): f.add_subplot(rows*2,cols,((2*i+1)*cols)+(j+1)) # adding subplot to figure on each iteration plt.imshow(test_data_noisy[i*cols + j]) plt.axis("off")
f.suptitle("Autoencoder Results - Cifar10",fontsize=18)
plt.show()

Now, we will define a function to calculate the difference between two images. Here we use the mean squared error as the loss function

def mse(data_1, data_2): return np.square(np.subtract(data_1, data_2)).mean()
noisy_clean_mse = mse(test_data_clean, test_data_noisy)
denoised_clean_mse = mse(test_data_denoised, test_data_clean)
noisy_clean_mse, denoised_clean_mse

Predict the denoised version of an image

cifar10_test_denoised = autoencoder.predict(cifar10_test_noisy)

Print the original, noisy, and denoised version of an image

idx = 6
plt.subplot(1,3,1)
plt.imshow(cifar10_test[idx])
plt.title('original')
plt.subplot(1,3,2)
plt.imshow(cifar10_test_noisy[idx])
plt.title('noisy')
plt.subplot(1,3,3)
plt.imshow(cifar10_test_denoised[idx])
plt.title('denoised')
plt.show()

Find the MSE between the images

clean_noisy = mse(cifar10_test, cifar10_test_noisy)
clean_denoised = mse(cifar10_test, cifar10_test_denoised)
clean_noisy, clean_denoised
print("The difference between the two images is:", clean_noisy-clean_denoised)

That’s all for now! You can build your autoencoders 😎😎. Explore more datasets and have fun training your own autoencoders.

Conclusion

Autoencoders are powerful and can do a lot more. Here I introduced you to 2 simple models examples, and you can see how well our model performed on the denoising task. There are other uses as well, such as using an autoencoder for sequential data. One of such example is Variational autoencoder (VAE), which is a slightly more advanced and modern concept. It can also be used to generate images.

You can also check my Github repo regarding this project also.

This completes our discussion on all three projects! 🥳

Special Thanks!

For this article, I will be giving special thanks to Vetrivel_PS who regularly motivates me to write such types of articles since these types of articles are very helpful to all of the people who want to either make a transition into Data Science or wants to excel in the field of Data Science.

If you want to also join Vetrivel’s Data Science Community, you can go to the Hackweekly Linkedin Community (https://www.linkedin.com/company/thehackweekly) and join it.

About the Author

You can also check my previous blog posts.

Previous Data Science Blog posts.

Here is my Linkedin profile in case you want to connect with me. I’ll be happy to be connected with you.

Email

For any queries, you can mail me on Gmail.

References :

Image 1 : https://towardsdatascience.com/fake-face-generator-using-dcgan-model-ae9322ccfd65

Image 2: https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.leewayhertz.com%2Fface-mask-detection-system%2F&psig=AOvVaw02aHbLLICPG-G31GnWPnwF&ust=1629113361404000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCKivvrH2svICFQAAAAAdAAAAABAD

Image 3: https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.analyticsvidhya.com%2Fblog%2F2021%2F01%2Fauto-encoders-for-computer-vision-an-endless-world-of-possibilities%2F&psig=AOvVaw1LmmWrE4OS1_Cjr6QO9pmg&ust=1628144426721000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCODNwvHclvICFQAAAAAdAAAAABAI

Image 4 : https://www.google.co.in/url?sa=i&url=https%3A%2F%2Fwww.researchgate.net%2Ffigure%2FThe-architecture-of-Unet_fig2_334287825&psig=AOvVaw2DB1XzXKEZAAf_mzzSwhlC&ust=1628144257443000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCOi1zaHclvICFQAAAAAdAAAAABAD

End Notes

Thanks for reading!

I hope that you have enjoyed the article. If you like it, share it with your friends also. Something not mentioned or want to share your thoughts? Feel free to comment below And I’ll get back to you. 😉

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.analyticsvidhya.com/blog/2021/09/three-computer-vision-projects-to-skyrocket-your-data-science-career/

spot_img

Latest Intelligence

spot_img