Hyperparameter Tuning (Neural Network Training)

Below is everything that I have learned to train a neural network in practice. Also see CNN to learn about how to use them in practice.

New Guide by Google for NN tuning came out: https://github.com/google-research/tuning_playbook

In parameter, we use model ensembles to get 2% extra performance in the following steps:

  1. Train multiple independent models
  2. At test time, average their results

We can also perform Polyak averaging, where instead of using actual parameter vector, we keep a moving average of the parameter vector and use that at test time:

while True:
	data_batch = dataset.sample_data_batch()
	loss = network.forward(data_batch)
	dx = network.backward()
	x += - learning_rate * dx
	x_test =  0.995 * x_test + 0.005*x # use for test set

Regularization

Hyperparameter Tuning

https://cs231n.github.io/neural-networks-3/ Hyperparameters are choices about the algorithm that we set rather than learn.

They are very problem-dependent and we must try them all out to see what works best.

For hyperparameter tuning, use the validation Dataset.

Some parameters:

  • Learning rate / step size (Most important to figure out, this determines how fast to update for Gradient Descent)
    • setting it to small will make learning too slow
    • setting it to big will make it overstep and unstable (i.e. loss with sometimes get bigger, sometimes smaller between each training step)
    • Maybe try learning rate decay
  • Regularization Parameters such as Dropout

Andrej Karpathy says that our loss shouldn’t look like a hockey stick!! Omg, all my training used to always be like that. The reason it’s like a hockey still is because of the easy gains from bad initialization. If you have good initialization, then your loss curve should be rather straight.

At initialization, each of the class should have about uniform probability of being selected. So you should know approximately what the loss is.

There are two ways to search for hyperparameters:

Also see [[notes/Computer Vision#Tips for doing well on benchmarks/winning competitions|Computer Vision#Tips for doing well on benchmarks/winning competitions]]