Zephyrnet Logo

Understand Transfer Learning Using TensorFlow.JS

Date:

This article was published as a part of the Data Science Blogathon

Introduction

Also, in my opinion, Transfer Learning is the most promising technique to use on the client-side with TensorFlowJS. Here, a big advantage over using the same technique on the server is the preservation of the confidentiality of client information and the availability of the ability to access sensors (camera, Geo-location, etc.).

How does Transfer Learning work?

The way Transfer Learning works is simple. The model is first trained on a large set of training data. During the training process, the neural network extracts a large number of useful characteristics (features) of a specific problem being solved, which can be used as a basis for a new one, which will be trained on a small number of training data for a more specific, but similar problem (Figure 1). Thus, overfitting can happen on devices with limited resources in relatively less time.

Figure 1 - Block diagram of Transfer LearningFigure 1 – Block diagram of Transfer Learning

To load a model into TensorFlow, a special JSON format is used, which in turn can be of two types: graph-model or layers-model (Figure 2).

Figure 2 - Classification of model serialization formats compatible with TensorFlowJS

Figure 2 – Classification of model serialization formats compatible with TensorFlowJS

For Transfer Learning, the model in the graph-model format is not applicable, we can use this model only in the form in which it was loaded, without the possibility of retraining it on a new data sample or changing the topology to fit our needs. Therefore, below we will talk only about the model in the layers-model format.

To load a pretrained model in layers-model format, TensrFlowJS provides a tf.loadLayersModel method in its CUI, which loads the topology of the model and its weights from a lengthy training process. The topology of the model is set in JSON format, which also contains the weights Manifest field indicating the paths in a binary format containing all the weights of the connections of the trained neural network (in the case of a complex neural network with millions of trained parameters, the model weights can be represented in several shard files).

With a special converter, you can convert models, such as those trained in Python using the Keras framework (now included as a sub-module in TensorFlow), to a TF.js compatible format.

After loading the model, we can modify and retrain it according to the requirements of the task. Since we will strive to reduce the time for training the model based on a new sample of data, it will be obvious that it is necessary to “freeze” as many layers in the original model as possible. Freezing a layer is a transfer of all neurons in a layer from the trainable category to the untrainable category. During network training, the optimizer will only modify the trained network parameters, which can significantly reduce the time for its training.

A fair question then would be how many and which layers should we freeze. To understand this intuitively, let’s look at a typical neural network block diagram that deals with image classifications between N classes.

Figure 3 - Neural network topology for image classification between N classes

Figure 3 – Neural network topology for image classification between N classes

A neural network comprises a plurality of series-connected layers convolution ( convolutional layers ) and connected at the end of several hidden layers fully connected neural networks ( dense layers, fully-connected layers ).

Recall that the purpose of convolutional layers is to extract characteristic features of images, while the first convolutional layers (relative to the network input) will extract the simplest patterns from the image – edges, contours, arcs. The next convolutional layers, combining different patterns of the previous layer, will form more complex textures – circles, squares, which can then fold into a part of a person’s face, a car wheel, etc. – everything here will depend on the context of the problem being solved.

Even if we use a model that knows how to recognize machines, and we need to recognize the presence of a human face in the image, despite the fact that the tasks seem to be completely different, the convolutional layers that are closer to the input of the neural network for both networks will extract identical features.Figure 4 - Features allocated by convolutional layers at different levels of the neural network| Transfer Learning Using TensorFlow.JS

Figure 4 – Features allocated by convolutional layers at different levels of the neural network

!!!! Thus, taking into account the above, it is necessary to start freezing layers from the input of the neural network, and the number of frozen layers is directly proportional to the similarity of the problem being solved with the problem that the preloaded model solves.

Another equally important parameter for choosing how many layers to freeze is the size of the available dataset with which we are going to retrain the model.

For example, if we have a limited number of training data with which we are going to retrain the model, and the new task is similar to the task that the preloaded network solved, then in this case it makes sense to freeze all convolutional layers. On the other hand, if we have a larger amount of training data, and the preloaded network does not look like the problem that we are solving, then here we can not freeze the layers at all.

In the latter case, you can ask a question – so if we do not fix a single layer, then it makes sense to load a pre-trained model with its weights. However, when training a model from scratch – the framework initializes the weights of the neural network arbitrarily, which makes the process of training the model longer than if we used the already preset weights, in which, as we noted above, the low-level layers for different tasks will have approximately identical settings.

!!!! To summarize, the above said – the more training sample we have for the training model, the fewer layers we need to freeze.

You can see the visualization of the algorithm described above in Figure 5.

Figure 5 - Scheme of setting up a new neural network based on the already trained | Transfer Learning Using TensorFlow.JS

Figure 5 – > Scheme of setting up a new neural network based on the already trained

Implementing Transfer Learning Using Tensorflow.js

We have a task to develop one of the components for playing rock-paper-scissors. The component is an interface that asks the user to show the camera – a rock, scissors, and paper. After the learning process, the component must independently determine what the user is showing.

First, let’s take a look at the big picture and what we are going to do and analyze each step.

Figure 6 - Structural diagram for creating a new model based on the trained MobileNet model | Transfer Learning Using TensorFlow.JS

Figure 6 – > Structural diagram for creating a new model based on the trained MobileNet model

Loading the trained MobileNet model and analyzing it

MobileNet is a model trained on the ImageNet sample (a collection of over a million images, manually split between 1000 classes).

Let’s load the model using a custom React hook:

export default () => {
const [model, setModel] = useState(); useEffect(() => { (async function init() {
const model = await tf.loadLayersModel(MODEL_URL); setModel(() => model); model.summary(); })(); }, []);
return model;
}

Let’s look at the topology of the loaded network by calling model.summary () , Figure 7.

Figure 7 - Topology of the loaded modelFigure 7 – Topology of the loaded model

The model consists of 88 layers, the input layer of which is represented by a tensor of dimensions [null, 224, 224, 3], accepting an image with dimensions of 224×224 pixels with three color channels. The output layer is a tensor with dimension [null, 1000].

The output tensor [null, 1000] is the so-called one-hot vector, in which all values ​​are equal to zero, except for one, for example [0, 0, 1, 0, 0] – This means that the neural network counts with probability 1 that in the image there is a class with index 2 (indexing, as usual, starts from zero). When we use the model, this vector will represent the probability distribution for each of the classes, for example, you can get the following values: [0.07, 0.1, 0.03, 0.75, 0.05]. Please note that the sum of all values ​​will be equal to 1, and the model considers with a maximum probability of 0.75 that this is an object of a class with index 3.

Due to the fact that for our task we have only 3 classes, then we need to modify the topology of the model since the original loaded model is able to classify between 1000 classes. Let’s draw a diagram of what we want to do.

Model layers can be divided arbitrarily into groups responsible for:

– extracting the image feature

– classification of images between 1000 classes

for the new model, we want to use all the layers are responsible for feature extraction of images, but at the same time eliminate all the layers responsible for the classification of images between 1000 classes, and instead provide your own classifier, which will classify between 3 classes.

Figure 8 shows a diagram of the breakdown of the loaded layers by their purpose (Figure 8).

Figure 7 - Creating a new model based on an existing one

Figure 8 – Creating a new model based on an existing one

By the way, here you can analyze our task on the subject of whether it is worth freezing the layers. First, we will definitely not force the user to make a large training sample to train the network, 30-50 images of each class will be quite enough for ourselves. Also, the problem we are solving – the classification between 3 classes of images with gestures, is quite close to the problem of classifying images between 1000 classes, which was previously trained on a large set of training images. Therefore, according to Figure 5, we can freeze all layers of the part of the MobileNetV2 model that we extracted for our needs:

// freeze all layers of MobileNet

for (const layer of pretrainedModel.layers) { layer.trainable = false;
}

Find the last layer in the loaded model responsible for extracting the image signature

To do this, we will use the API tf.LayersModel.getLayer, which allows you to find the desired layer either by its index or by a unique name. It is more preferable to search for a layer by a unique name than by its index. Pay attention (Figure 7), each layer of the loaded model has a unique name: input_1; conv_pw_13_bn; conv_pw_13_relu , etc. As agreed, we will find the last convolutional layer in the model and this is a layer named conv_pw_13_relu. Thus, the code will look like this:

const truncatedLayer = pretrainedModel.getLayer('conv_pw_13_relu');
const truncatedLayerOutput = truncatedLayer.output as SymbolicTensor;

Create a new classifier to classify between the Three classes

The model will consist of several layers (Figure 8):

– the first layer – we must transform the multidimensional tensor that we get from the MobileNetV2 convolutional layers into a one-dimensional array so that it is compatible with fully-connected, dense layers of the model for classification;

– the second layer is a hidden fully connected layer that will contain 100 neurons (the parameter can be selected based on experiments, this is the so-called hyperparameter) with the RELU activation function. RELU is the most common activation function that adds non-linearity to the model;

– the third layer is an output fully connected layer containing exactly 3 neurons, each of which determines the probability of an image belonging to one of three classes: rock, scissors, paper. The layer will be with the SOFTMAX activation function. SOFTMAX is a function that is always used for the last layer to solve multiple classification problems. It transforms a vector z of dimension K into a vector of the same dimension σ, where each value is represented by a real number in the interval [0, 1], and their sum is equal to one. And as mentioned above, each value in the vector σ i is treated as the probability that the object belongs to class i.

Figure 8 - New layers for the model, responsible for the classification between 3 classesFigure 9 – New layers for the model, responsible for the classification between 3 classes

The code will look like this:

function buildNewHead(inputShape: Shape, numClasses: number) { // Creates a 2-layer fully connected model
return tf.sequential({ layers: [ tf.layers.flatten({ name: 'flatten', inputShape: inputShape.slice(1), }), tf.layers.dense({ name: 'hidden_dense_1', units: 100, activation: 'relu' }), tf.layers.dense({ name: 'softmax_classification', units: numClasses, activation: 'softmax' }) ] });
}

Associating the New Classifier with the Layers of the Pretrained Model

In step 2, we found the last layer in the model graph that we want to keep in the new development model. Let’s link this layer to the classifier created in step 3 (see Figure 6). For this, instances of the model/layer class have a special application method that makes connections between layers. It will look like this:

const transferHead = buildNewHead(truncatedLayerOutput.shape, numClasses);
const newOutput = transferHead.apply(truncatedLayerOutput) as SymbolicTensor;

Creating a new model and compiling it

To create a new model based on an existing one, we will use tf.model, which takes an object with two required fields inputs and outputs. Both parameters expect an array, where each element of the array is a Symbolic tensor (the difference from the usual tensor is that in fact, it is a description of a tensor that has information about the dimension and type, but there is no data, the so-called placeholder).

Each layer in the model has two attributes – input and output, represented by Symbolic tensor.

Here it is sometimes convenient to associate a layer with a regular JS function that takes an input parameter, some calculation logic is performed inside and always returns an output parameter. The difference here is that the input and output parameters will be Symbolic Tensor.

So, we need to define inputs and outputs for the new model. The input for the new model will match the input of the loaded model and is available via pretrainedModel.inputs.

And the output of Symbolic Tensor is the output after the last layer of the new classifier, which is available in the new output variable, after applying the apply method in the previous step.

// build new model

const model = tf.model({ inputs: pretrainedModel.inputs, outputs: newOutput
});

Let’s compile the model by specifying the desired optimizer and loss function. The best optimizer is tf.train.adam and the classification loss function is usually categorical cross-entropy:

model.compile({
optimizer: tfC.train.adam(0.0001), loss: 'categoricalCrossentropy'
});

Complete Code for Creating a New Model Based on Pretrained MobileNet

It remains only to add UX elements that allow the user to set selections for three types of images, provide a button that initiates the learning process:

function App() { // skipped for brevity
const tr = async () => {
if (model && controllerDataset.current.xs !== null && controllerDataset.current.ys !== null) {
const {current: {xs, ys}} = controllerDataset; model.fit(xs, ys, { batchSize: 24, epochs: 20, shuffle: true, callbacks: { onBatchEnd: async (batch, logs) => { setLoss(() => (logs?.loss as number).toFixed(5)); }, onTrainEnd: () => { changeTrainingState(() => TRAINING_STATES.TRAINED); } } }); } };
return ( <div> {/* skipped for brevity*/} <button
 disabled={trainingState === TRAINING_STATES.TRAINING} onClick={() => { if (trainingState !== TRAINING_STATES.TRAINING) { changeTrainingState(() => TRAINING_STATES.TRAINING); train(); } }}> Train </button> </div> );
}

After the training process is completed, you need to start the process of using this model to classify images:

const predict = async () => {
if (webcamIterator && model) { const image = await webcamIterator.capture(); const processedImage = tf.tidy( () => image.expandDims().toFloat().div(127).sub(1) ); const label = tf.argMax(model.predict(processedImage) as Tensor, 1) .dataSync(); setActiveLabel(() => label[0]); image.dispose(); processedImage.dispose(); requestAnimationFrame(() => predict()); }
};
useEffect(() => {
if (trainingState === TRAINING_STATES.TRAINED) { requestAnimationFrame(() => predict()); }
}, [trainingState]); 

Conclusion

The way Transfer Learning works is simple. The model is first trained on a large set of training data. During the training process, the neural network extracts a large number of useful characteristics (features) of a specific problem being solved, which can be used as a basis for a new one, which will be trained on a small number of training data for a more specific, but similar problem. Thus, overfitting can happen on devices with limited resources in relatively less time.

Image Sources:

Figure 1

Figure 2

Figure 3

Figure 4 https://image.slidesharecdn.com/150721cdbn-150720184716-lva1-app6892/95/convolutional-deep-belief-networks-for-scalable-unsupervised-learning-of-hierarchical-representations-41-638.jpg?cb=1437418192

Figure 5 https://image.slidesharecdn.com/150721cdbn-150720184716-lva1-app6892/95/convolutional-deep-belief-networks-for-scalable-unsupervised-learning-of-hierarchical-representations-41-638.jpg?cb=1437418192

Figure 6 https://image.slidesharecdn.com/150721cdbn-150720184716-lva1-app6892/95/convolutional-deep-belief-networks-for-scalable-unsupervised-learning-of-hierarchical-representations-41-638.jpg?cb=1437418192

Figure 7

Figure 8

Figure 9

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

PlatoAi. Web3 Reimagined. Data Intelligence Amplified.
Click here to access.

Source: https://www.analyticsvidhya.com/blog/2021/09/understand-transfer-learning-using-tensorflow-js/

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?