How to build confidence from scratch

And learn a thing or two about weight initialization

How to build confidence from scratch

Nov 1, 2019 · 8 min read

How to build confidence from scratch

T his is actually an assignment from Jeremy Howard’s fast.ai course, lesson 5. I’ve showcased how easy it is to build a Convolutional Neural Networks from scratch using PyTorch. Today, let’s try to delve down even deeper and see if we could write our own nn.Linear module. Why waste your time writing your own PyTorch module while it’s already been written by the devs over at Facebook?

Well, for one, you’ll gain a deepe r understanding of how all the pieces are put together. By comparing your code with the PyTorch code, you will gain knowledge of why and how these libraries are developed.

Also, once you’re done, you’ll have more confidence in implementing and using all these libraries, knowing how things work. There will be no myth to you.

And last but not least, you’ll be able to modify/tweak these modules should the situation require. And this is the difference between a noob and a pro.

OK, enough of the motivation, let’s get to it.

Simple MNIST one layer NN as the backdrop

F irst of all, we need some ‘backdrop’ codes to test whether and how well our module performs. Let’s build a very simple one-layer neural network to solve the good-old MNIST dataset. The code (running in Jupyter Notebook) snippet below:

How to build confidence from scratch

These codes are quite self-explanatory. We used the fast.ai library for this project. Download the MNIST pickle file and unzip it, transfer it into a PyTorch tensor, then stuff it into a fast.ai DataBunch object for further training. Then we created a simple neural network with only one Linear layer. We also write our own update function instead of using the torch.optim optimizers since we could be writing our own optimizers from scratch as the next step of our PyTorch learning journey. Finally, we iterate through the dataset and plot the losses to see whether and how well it works.

First Iteration: Just make it work

All PyTorch modules/layers are extended from the torch.nn.Module .

Within the class, we’ll need an __init__ dunder function to initialize our linear layer and a forward function to do the forward calculation. Let’s look at the __init__ function first.

We’ll use the PyTorch official document as a guideline to build our module. From the document, an nn.Linear module has the following attributes:

How to build confidence from scratch

So we’ll get these three attributes in:

The class also needs to hold weight and bias parameters so it can be trained. We also initialize those.

How to build confidence from scratch

Here we used torch.nn.Parameter to set our weight and bias , otherwise, it won’t train.

Also, note that we used torch.randn instead of what’s described in the document to initialize the parameters. This is not the best way of doing weights initialization, but our purpose is to get it to work first, we’ll tweak it in our next iteration.

OK, now that the __init__ part is done, let’s move on to forward function. This is actually the easy part:

We first get the shape of the input, figure out how many columns are in the input, then check whether the input size match. Then we do the matrix multiplication (Note we did a transpose here to align the weights) and return the results. We can test whether it works by giving it some data:

We have a 5×20 input, it goes through our layer and gets a 5×10 output. You should get results like this:

How to build confidence from scratch

OK, now go back to our neural network codes and find the Mnist_Logistic class, change self.lin = nn.Linear(784,10, bias=True) to self.lin = myLinear(784, 10, bias=True) . Run the code, you should see something like this plot:

How to build confidence from scratch

As you can see it doesn’t converge quite well (around 2.5 loss with one epoch). That’s probably because of our poor initialization. Also, we didn’t take care of the bias part. Let’s fix that in the next iteration. The final code for iteration 1 looks like this:

Second iteration: Proper weight initialization and bias handling

We’ve handled __init__ and forward , but remember we also have a bias attribute that if False , will not learn additive bias. We have not implemented that yet. Also, we used torch.nn.randn to initialize the weight and bias, which is not optimum. Let’s fix this. The updated __init__ function looks like this:

First of all, when we create the weight and bias parameters, we didn’t initialize them as the last iteration. We just allocate a regular Tensor object to it. The actual initialization is done in another function reset_parameters (will explain later).

For bias , we added a condition that if True , do what we did the last iteration, but if False , will use register_parameter(‘bias’, None) to give it None value. Now for reset_parameter function, it looks like this:

The above code is taken directly from PyTorch source code. What PyTorch did with weight initialization is called kaiming_uniform_ . It’s from a paper Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification — He, K. et al. (2015).

How to build confidence from scratch

What it actually does is by initializing weight with a normal distribution with mean 0 and variance bound , it avoids the issue of vanishing/exploding gradients issue(though we only have one layer here, when writing the Linear class, we should still keep MLN in mind).

Notice that for self.weight , we actually give the a a value of math.sqrt(5) instead of the math.sqrt(fan_in) , this is explained in this GitHub issue of PyTorch repo for whom might be interested.

Also, we can add some extra_repr string to the model:

The final model looks like this:

Rerun the code, you should be able to see this plot:

How to build confidence from scratch

We can see it converges much faster to a 0.5 loss in one epoch.

Conclusion

I hope this helps you clear the cloud on these PyTorch nn.modules a bit. It might seem boring and redundant, but sometimes the fastest( and shortest) way is the ‘boring’ way. Once you get to the very bottom of this, the feeling of knowing that there’s nothing ‘more’ is priceless. You’ll come to the realization that:

Underneath PyTorch, there’s no trick, no myth, no catch, just rock-solid Python code.

Also by writing your own code, then compare it with official source code, you’ll be able to see where the difference is and learn from the best in the industry. How cool is that?