deep-learningnlptransformers
Attention mechanism
Scaled dot-product attention, multi-head attention, and positional encodings — the core of transformers.
Уровни глубины
L0Intro~1ч
Understands attention as a learned weighted sum over context; knows it powers transformers.
L1Basics~10ч
Implements scaled dot-product and multi-head attention from scratch; understands queries, keys, values.
L2Working~20ч
Applies different positional encodings (sinusoidal, RoPE, ALiBi); understands causal masking; implements cross-attention.
L3Advanced~35ч
Understands attention complexity and memory; applies sparse/linear attention (Longformer, Linformer, FlashAttention).
L4Research~70ч
Contributes to efficient attention research, interpretability of attention patterns, or multi-modal attention.
Ресурсы
L0 — Intro