deep-learningnlptransformers

Transformer architecture

Encoder-decoder and decoder-only transformers — the dominant architecture for language and multi-modal models.

Уровни глубины

L0Intro~1ч

Knows transformers power GPT/BERT; understands self-attention replaces recurrence.

L1Basics~15ч

Implements a minimal GPT-2 from scratch; understands encoder, decoder, feed-forward layers, residual connections.

L2Working~25ч

Fine-tunes BERT/GPT on downstream tasks using HuggingFace; understands layer norm placement, KV cache, context length.

L3Advanced~40ч

Analyses architectural variants (PaLM, LLaMA, Mistral); implements grouped-query attention, sliding window attention; understands emergent abilities.

L4Research~80ч

Contributes to novel transformer architectures, mechanistic interpretability, or scaling laws research.

L0 — Intro

L1 — Basics

L2 — Working

L3 — Advanced