deep-learningnlptransformers

Attention mechanism

Scaled dot-product attention, multi-head attention, and positional encodings — the core of transformers.

Уровни глубины

L0Intro~1ч

Understands attention as a learned weighted sum over context; knows it powers transformers.

L1Basics~10ч

Implements scaled dot-product and multi-head attention from scratch; understands queries, keys, values.

L2Working~20ч

Applies different positional encodings (sinusoidal, RoPE, ALiBi); understands causal masking; implements cross-attention.

L3Advanced~35ч

Understands attention complexity and memory; applies sparse/linear attention (Longformer, Linformer, FlashAttention).

L4Research~70ч

Contributes to efficient attention research, interpretability of attention patterns, or multi-modal attention.

L0 — Intro

L1 — Basics

L2 — Working

L3 — Advanced