Infrastructure · intermediate

What is Latency?

A plain-English explanation of Latency (Latency) — what it means, why it matters, and how it is used in AI.

Latency
Latency
Latency in AI systems refers to the time delay between sending a request to a model and receiving a response.
"A coding assistant that takes 10 seconds to respond feels frustrating, while one that streams responses token-by-token feels fast and responsive."

Also known as: Response latency, inference latency, time to first token

Why does Latency matter?

Latency is a key engineering concern for any AI product — directly affecting user satisfaction.

Practice this term

The best way to remember Latency is to practice unscrambling it. AI Terminology Scrambler uses spaced repetition to help you learn and retain AI vocabulary in just a few minutes a day.

Practice Latency now →

Related AI terms