Sequence-To-Sequence Model (Seq2Seq)

Sequence-to-sequence modeling is a family of models for transforming one sequence into another.

Examples:

Machine translation: Input: “Bonjour le monde” → Output: “Hello world.”
Speech recognition: Input: audio frames → Output: text transcription.
Text summarization: Input: long document → Output: concise summary.
Dialogue systems: Input: user utterance → Output: response.

Is next token prediction Seq2Seq?

It’s autoregressive, not seq2seq.

The original Transformer was made for Seq2Seq.

Most classic seq2seq models (RNN encoder–decoder, Transformer for MT, summarization, etc.) are autoregressive.

The decoder generates one token at a time, left-to-right, conditioning on previous outputs.

why don’t seq2seq models just generate the entire output sequence in one shot (like a big classifier over all possible sequences), instead of autoregressively?

Modeling all sequences directly as a single classification problem is intractable.

Autoregression breaks this into tractable steps.

🛠️ Steven Gong

Sequence-To-Sequence Model (Seq2Seq)

Graph View

Backlinks