Deep Learning Dimensions

  • B = batch size
  • L = time steps (Sequence length)
  • C = channels (number of features, embedding dimension)

We don’t really have the concept of T in vanilla NN. Each input is just a vector (or a batch of vectors), without an inherent temporal or sequential dimension.

Features

Sometimes, I see people referring to # of features for the hidden layer. This is just referring to the # of channels, or number of nodes per layer when batch size = 1).

The number of channels can change after each layer, but generally batch size is fixed throughout the entire network.