deep-learningarchitecturesfoundations

Activation functions

ReLU, GELU, SiLU/Swish, softmax — the nonlinearities that give neural networks their expressive power.

Уровни глубины

L0Intro~1ч

Knows "no non-linearity = linear model".

L1Basics~4ч

Draws sigmoid, tanh, ReLU; explains dying-ReLU and vanishing-gradient problems.

L2Working~8ч

Uses GELU / SiLU in transformers; adjusts initialisation to activation.

L3Advanced~15ч

Derives gradient scaling per activation; connects to spectral properties.

L4Research~30ч

Learnable / search-based activations, connection to scaling laws.

L1 — Basics

L2 — Working