# Neural Network

A neural network is, simply put, a series of algorithms that is extremely good at recognizing underlying relationships (correlations) in a set of data through a process that mimics the way the human brain operates.

Neural Networks are simply really good function approximators.

Most Common Neural Net Mistakes:

- You didnâ€™t try to overfit a single batch first
- You forgot to toggle train/eval mode for the net
- You forgot .zero_grad() (in PyTorch) before .backward()
- You passed softmaxed outputs to a loss that expects raw logits

See A Recipe for Training Neural Networksby Andrej Karpathy, published in 2019.

I wrote an article on it on Medium, see Coding a Neural Network.

I remember this, but the reason we have Activation Functions is to get a non-linear property. Else if everything is linear, we can just collapse all of the linear layers in a single linear layer.

### Name: Vanilla NN / Feedforward NN / MLP

When I was learning this at first in 2018, they were just called feedforward NNs to define an architecture composed only of fully-connected layers, but it seems now they more commonly take on the name of MLP (for Multilayer Perceptron)

**Fully-connected layer** means that every node in the layer has weights connecting it to every single node in the previous and next layer.

### Concepts

### Types of Neural Networks

Also see Deep Learning.

### Steps

- Import the
**training set**which serves as the**input layer**. **Forward propagate**the data from the**input layer**through t**he hidden layer**to the**output layer**, where we get a predicted value**y**. Forward propagation is the process by which we multiply the**input node**by a random**weight**, and applying the**activation function**.- Measure the
**error**between the predicted value and the real value. **Backpropagate**the error and use**gradient descent**to modify the**weights**of the connections.- Repeats these steps until the error is
**minimized**sufficiently, by finding the optimal weights.

## Vocabulary

**The input layer:** What the machine always knows. Ex: The banking behavior of a customer.

**The hidden layer:** Where the magic happens.

**The output layer:** What the machine will predict Ex: Whether or not the customer will quit within the next 6 months.

**Node/Neuron:** A thing that holds a number. Represented by a circle in the image.

**Gradient descent:** The algorithm that allows us to get more and more accurate data as the model improves by updating the weights of the connections.

**Weights:** These are the things that get updated by the model to become more accurate after every iteration. They are represented by the connections formed between each neuron. Each connection has a different weight.