deep-learningnlptransformers
Transformer architecture
Encoder-decoder and decoder-only transformers — the dominant architecture for language and multi-modal models.
Уровни глубины
L0Intro~1ч
Knows transformers power GPT/BERT; understands self-attention replaces recurrence.
L1Basics~15ч
Implements a minimal GPT-2 from scratch; understands encoder, decoder, feed-forward layers, residual connections.
L2Working~25ч
Fine-tunes BERT/GPT on downstream tasks using HuggingFace; understands layer norm placement, KV cache, context length.
L3Advanced~40ч
Analyses architectural variants (PaLM, LLaMA, Mistral); implements grouped-query attention, sliding window attention; understands emergent abilities.
L4Research~80ч
Contributes to novel transformer architectures, mechanistic interpretability, or scaling laws research.
Ресурсы
L1 — Basics
L2 — Working