deep-learningarchitecturesfoundations
Activation functions
ReLU, GELU, SiLU/Swish, softmax — the nonlinearities that give neural networks their expressive power.
Уровни глубины
L0Intro~1ч
Knows "no non-linearity = linear model".
L1Basics~4ч
Draws sigmoid, tanh, ReLU; explains dying-ReLU and vanishing-gradient problems.
L2Working~8ч
Uses GELU / SiLU in transformers; adjusts initialisation to activation.
L3Advanced~15ч
Derives gradient scaling per activation; connects to spectral properties.
L4Research~30ч
Learnable / search-based activations, connection to scaling laws.
Ресурсы
L1 — Basics