# Loss Function

The loss function helps us define how well/bad a particular model is doing with its predictions. The higher the loss/cost function, the worse it is doing.

With Neural Networks, we use the loss function to do Gradient Descent and backpropagation.

We can differentiate a loss function with respects to a set of particular weights, which is how we can compute $dW$.

### Score Function vs. Loss Function

The Loss Function takes in a Score Function to compute that difference with a truth value.

- A (parameterized) Score Function maps raw data to class scores (e.g. a linear function)
- a Loss Function quantifies the agreement between the predicted scores and the ground truth labels. We minimize the loss function with respect to the parameters of the score function.
- Defining a loss function is very important because it tells you you much you much to punish your model for making a mistake:
- L1 Distance is linear, so making a small mistake and large mistake are linearly related
- L2 Distance is squared, so there is a big gap between making small and large mistakes
- In practice, different models use different implementations of loss functions
- See SVM for SVM Multi-Class loss function, which is a Hinge Loss
- Softmax Classifier

- Defining a loss function is very important because it tells you you much you much to punish your model for making a mistake:

The final loss combines the Loss Function with Regularization.

I don’t know if this is related, but I am also learning from Andrej Karpathy about:

- Negative Log Likelihood as a loss function