Deep Learning Dimensions
- B = batch size
- L = time steps (Sequence length)
- C = channels (number of features, embedding dimension)
We don’t really have the concept of T in vanilla NN. Each input is just a vector (or a batch of vectors), without an inherent temporal or sequential dimension.
Features
Sometimes, I see people referring to # of features for the hidden layer. This is just referring to the # of channels, or number of nodes per layer when batch size = 1).
The number of channels can change after each layer, but generally batch size is fixed throughout the entire network.