nlpllminferencesystems
Quantization
INT8/INT4/FP8 quantisation — reducing model size and inference cost with controlled accuracy loss.
Уровни глубины
L0Intro~0ч
Knows quantisation reduces model precision to save memory; has heard of 4-bit models.
L1Basics~5ч
Applies GPTQ or bitsandbytes 4-bit quantisation to load a large model on consumer GPU.
L2Working~15ч
Understands PTQ vs QAT; applies calibration; evaluates accuracy-speed tradeoffs; uses AWQ, GPTQ, SmoothQuant.
L3Advanced~30ч
Implements custom quantisation schemes; analyses outlier channels; tunes per-layer precision.
L4Research~60ч
Contributes to 1-bit or mixed-precision research, quantisation-aware architectures, or hardware-aware ML.
Ресурсы
L1 — Basics
L2 — Working