nlpllminferencesystems

Quantization

INT8/INT4/FP8 quantisation — reducing model size and inference cost with controlled accuracy loss.

Уровни глубины

L0Intro~0ч

Knows quantisation reduces model precision to save memory; has heard of 4-bit models.

L1Basics~5ч

Applies GPTQ or bitsandbytes 4-bit quantisation to load a large model on consumer GPU.

L2Working~15ч

Understands PTQ vs QAT; applies calibration; evaluates accuracy-speed tradeoffs; uses AWQ, GPTQ, SmoothQuant.

L3Advanced~30ч

Implements custom quantisation schemes; analyses outlier channels; tunes per-layer precision.

L4Research~60ч

Contributes to 1-bit or mixed-precision research, quantisation-aware architectures, or hardware-aware ML.

L0 — Intro

L1 — Basics

L2 — Working

L3 — Advanced