Neural Network

A neural network is, simply put, a series of algorithms that is extremely good at recognizing underlying relationships (correlations) in a set of data through a process that mimics the way the human brain operates.

Neural Networks are simply really good function approximators.

Most Common Neural Net Mistakes:

  1. You didn’t try to overfit a single batch first
  2. You forgot to toggle train/eval mode for the net
  3. You forgot .zero_grad() (in PyTorch) before .backward()
  4. You passed softmaxed outputs to a loss that expects raw logits

See A Recipe for Training Neural Networksby Andrej Karpathy, published in 2019.

I wrote an article on it on Medium, see Coding a Neural Network.

I remember this, but the reason we have Activation Functions is to get a non-linear property. Else if everything is linear, we can just collapse all of the linear layers in a single linear layer.

Name: Vanilla NN / Feedforward NN / MLP

When I was learning this at first in 2018, they were just called feedforward NNs to define an architecture composed only of fully-connected layers, but it seems now they more commonly take on the name of MLP (for Multilayer Perceptron)

Fully-connected layer means that every node in the layer has weights connecting it to every single node in the previous and next layer.

Concepts

Types of Neural Networks

Also see Deep Learning.

Steps

  1. Import the training set which serves as the input layer.
  2. Forward propagate the data from the input layer through the hidden layer to the output layer, where we get a predicted value y. Forward propagation is the process by which we multiply the input node by a random weight, and applying the activation function.
  3. Measure the error between the predicted value and the real value.
  4. Backpropagate the error and use gradient descent to modify the weights of the connections.
  5. Repeats these steps until the error is minimized sufficiently, by finding the optimal weights.

Vocabulary

The input layer: What the machine always knows. Ex: The banking behavior of a customer.
The hidden layer: Where the magic happens.
The output layer: What the machine will predict Ex: Whether or not the customer will quit within the next 6 months.
Node/Neuron: A thing that holds a number. Represented by a circle in the image.
Gradient descent: The algorithm that allows us to get more and more accurate data as the model improves by updating the weights of the connections.
Weights: These are the things that get updated by the model to become more accurate after every iteration. They are represented by the connections formed between each neuron. Each connection has a different weight.