Build Your First Neural Network: Part 1
This post is first in a series introducing neural networks. If you find this article helpful, you can continue onto Part 2 and Part 3.
Welcome to my first post about Machine Learning. Machine Learning is radically changing the types of tasks that computers can solve.
I’ve recently been exposing myself to it by reading Grokking Deep Learning by Andrew Task. I wanted to take some time today to explain the basics of machine learning, neural networks, then an example of a one neuron network.
The Basics
As with almost anything, starting with the basics and building a good foundation is critical for success. Machine Learning is no different. Some introductions start you with a machine learning library, or introduce the mathematics that drives neural networks. While those are helpful, I think starting with a simple example and building on that helps me learn faster.
First, lets talk a little about neurons and neural networks.
Artificial Neurons
Both artificial neurons and neural networks are inspired by biology. The human brain is a biological neural network and has about 86 billion neurons.
Here is a diagram of a simple artificial neuron.
There’s an input (x), a weight (w) and an output (y). The formula for our simple neuron is y = xw.
If our weight is 0, then our output is 0.
If our weight is 1, then our output is the same as our input.
And if our weight is 2, then our output is double our input.
So far, pretty simply stuff. Even the definition for artificial neurons gets more complex, what artificial neurons do doesn’t change.
Artificial neurons take an input, apply some transformation, and output the result of that transformation. In this case, the weight (w) is the only thing affecting that transformation. (Formal definitions include a bias and an activation function)
I’m not sure if one neuron is technically considered a neural network (because network implies more than one), but our single neuron can still learn things, as we’re about to demonstrate.
Forward Propagation
The flow of data through a neural network, from input, transformation and output, it is called forward propagation. Forward propagation is predicting. The output (y) of our neural network is the prediction.
Backpropagation
More importantly, we want our neural network to make accurate predictions. A neural network learns how to make accurate predictions during its training phase.
While training our neural network, its predictions are measured against the correct answer. If the prediction is wrong, we modify the neural network in the hope that in the future it makes a better prediction.
This process of updating the neural network is called backpropagation.
In our example, backpropagation will update the single weight. Hopefully allowing it to make a more accurate prediction in the future.
Our First Neural Network
Our first neural network is going to do something really simple, it’s going to learn how to output the input number.
So if we pass 1 as the input, it will predict 1. If we pass 2 as the input, it will predict 2.
Like I said, really simple. Our network will only consist of one neuron. Lets start small and build up the necessary parts of our network.
First we make a single prediction.
There’s an input
, a weight
and our pred
, 0. This is the forward propagation part. We wanted it to predict the same number as our input
, it didn’t do that.
That means our neuron has learning to do.
We know the answer is wrong, but how do we quantify that? We’ll introduce two new calculations, error
and delta
.
Our error
tells us how far away we are from the right answer. We introduce goal_pred
which is the answer that we’re looking for.
Since error
is calculated via squaring, it’s always a positive number. This means its is a good measurement for how accurate our prediction is, but not a good indicator on how we should update our network to make better predictions.
That responsibility falls to delta
.
Now delta
will control how much we change our weight
to hopefully get a better prediction in the future.
We now update our weight
using delta
.
Now our weight
has changed to 1, which is the value we are looking for. The weight
is multiplied by the input
, and if the weight
is 1 our prediction is correct.
Lets clean this up a bit, break out training and test data and condense our code. You’ll notice everything is still there, just rearranged a bit.
You’ll notice that this example “learns” how to return the input number and validates that using training data.
Congrats, you just wrote your first neural network.
Complicating Things
This is obviously a simplistic example, and has a lot of holes in it. For instance, if we switch our test and training data we don’t get the answer that we’re looking for.
This time our neural network wasn’t anywhere close to the answer. What happened? Our first example worked perfectly but our second was way off!
There are several problems, but the one we want to address now is our learning rate.
We use delta
to tell us how much to update our weight
. If our delta
is big, our weight
can change too much, we leads us to overshot our goal. Then every subsequent update takes us further and further away from our goal.
This is what happened in our example, we’re not converging to our optimal weight
but diverging.
How can we control our learning rate? We’ll introduce another variable called alpha
.
Notice our delta
in our weight
update (line 19) now is scaled by our alpha
. We have to iterate through our training data 18 times, but eventually our weight converges onto our expected value (1).
Final Thoughts
This post just scratches the surface when it comes to machine learning. There’s so much more to learn and this simplistic example doesn’t do justice to the kinds of problems that can be solved with machine learning.
I’ll post more about it in the future, stay tuned.
If you found this helpful you can continue on to Part 2 of this series.