Loss Function

The loss function helps us define how well/bad a particular model is doing with its predictions. The higher the loss/cost function, the worse it is doing.

With Neural Networks, we use the loss function to do Gradient Descent and backpropagation.

We can differentiate a loss function with respects to a set of particular weights, which is how we can compute .

Score Function vs. Loss Function

The Loss Function takes in a Score Function to compute that difference with a truth value.

  • A (parameterized) Score Function maps raw data to class scores (e.g. a linear function)
  • a Loss Function quantifies the agreement between the predicted scores and the ground truth labels. We minimize the loss function with respect to the parameters of the score function.
    • Defining a loss function is very important because it tells you you much you much to punish your model for making a mistake:
      • L1 Distance is linear, so making a small mistake and large mistake are linearly related
      • L2 Distance is squared, so there is a big gap between making small and large mistakes
      • In practice, different models use different implementations of loss functions

The final loss combines the Loss Function with Regularization.

I don’t know if this is related, but I am also learning from Andrej Karpathy about:

Concepts