Techniques · intermediate

What is Quantization?

A plain-English explanation of Quantization (Quantization) — what it means, why it matters, and how it is used in AI.

Quantization
Quantization
Quantization is a technique for reducing the size and memory requirements of AI models by representing their weights using lower-precision numbers.
"A 70-billion-parameter model requires about 140GB of GPU memory at full precision. With 4-bit quantization, it shrinks to roughly 35GB."

Also known as: Model quantization, weight quantization, INT8, INT4

Why does Quantization matter?

Quantization is essential for running large models on limited hardware — enabling local deployment on laptops and consumer GPUs.

Practice this term

The best way to remember Quantization is to practice unscrambling it. AI Terminology Scrambler uses spaced repetition to help you learn and retain AI vocabulary in just a few minutes a day.

Practice Quantization now →

Related AI terms