Loss Function

The loss function helps us define how well/bad a particular model is doing with its predictions. The higher the loss/cost function, the worse it is doing.

With Neural Networks, we use the loss function to do Gradient Descent and backpropagation.

We can differentiate a loss function with respects to a set of particular weights, which is how we can compute $d W$ .

Score Function vs. Loss Function

The Loss Function takes in a Score Function to compute that difference with a truth value.

A (parameterized) Score Function maps raw data to class scores (e.g. a linear function)
a Loss Function quantifies the agreement between the predicted scores and the ground truth labels. We minimize the loss function with respect to the parameters of the score function.
- Defining a loss function is very important because it tells you you much you much to punish your model for making a mistake:
  - L1 Distance is linear, so making a small mistake and large mistake are linearly related
  - L2 Distance is squared, so there is a big gap between making small and large mistakes
  - In practice, different models use different implementations of loss functions
    - See SVM for SVM Multi-Class loss function, which is a Hinge Loss
    - Softmax Classifier

The final loss combines the Loss Function with Regularization.

I don’t know if this is related, but I am also learning from Andrej Karpathy about:

Negative Log Likelihood as a loss function

🛠️ Steven Gong

Table of Contents

Loss Function

Score Function vs. Loss Function

Concepts

Graph View

Backlinks