Regression

A regression is a statistical technique that relates a dependent variable to one or more independent (explanatory) variables. You predict a continuous value, rather than discrete classes such as in Classification.

We can use L1 and L2 distance to solve regression problems. In Stanford CS231n, it was a Classification problem, where you used the SVM loss to predict the class.

Practice implementing it: - https://www.deep-ml.com/problems/14

Linear Regression

Method 1: Using MLE

Simple linear regression model: Alternate Formulation

We use data to estimate

The Likelihood Function is given by ?? i am too lazy to put this

We come up with the line of best fit using MLEs. We get the following results (derivation is at page 402) for the estimates of the parameters :

The line of best fit is given by

has no predictive power for .

Method 2: Least Squares

I don’t think the teacher went too in depth for this… They both end up with the same final equation.

We are making the Gauss-Markov Theorem

We want to ask if if then has no predictive power for Suppose H_0\beta = 0H_1\beta \neq 0$

You do hypothesis testing, where it is given by

  • Note that (since this is the hypothesis we are testing) And then you use your t-table, where your Degrees of Freedom is .

Other

From the deep-ml practice, they just use some formula?

In linear regression, you’re trying to find parameters () that make predictions:

  • is your design matrix (rows = training examples, columns = features, often with a column of ones for the bias term).
  • is the vector of actual target values.
  • is the vector of parameters (weights) you’re solving for.

The normal equation is derived by minimizing the cost function:

You take the derivative with respect to , set it to zero, and solve. That derivation gives you:

So in short:

  • is the closed-form solution for the weights of your linear regression model that minimize the squared error.
  • It “came from” solving for the best parameters that make your prediction line (or hyperplane) fit the data as best as possible.

Normally, a linear regression model looks like this:

To express this in matrix form, we add a column of ones to :

In the deep ML problem, they don’t have a set of features equal to 1, which does not allow us to encode a bias value.

“A practical implementation involves augmenting ( X ) with a column of ones to account for the intercept term and then applying the normal equation directly to compute ( \theta ).“.