A plain-English explanation of Latency (Latency) — what it means, why it matters, and how it is used in AI.
Also known as: Response latency, inference latency, time to first token
Latency is a key engineering concern for any AI product — directly affecting user satisfaction.
The best way to remember Latency is to practice unscrambling it. AI Terminology Scrambler uses spaced repetition to help you learn and retain AI vocabulary in just a few minutes a day.
Practice Latency now →