Notion2Pi | Abstract Math Visualized

$\mathcal{L} = -\sum_{t=1}^{T}\sum_{-m\le j\le m, j\ne 0}\log P(w_{t+j}|w_t)$

$P(w_{t+j}|w_t)$

$\log P(w_{t+j}|w_t)$

$\sum_{t=1}^{T}\sum_{-m\le j\le m, j\ne 0}$

Skip-Gram Loss

Click on formula components below to explore their properties

Full Formula Properties

Category: Machine Learning

👶

Baby Fast Definition

It scores how well a word guesses its friends in a sentence.

Skip-Gram trains word embeddings by treating each word as a predictor of its neighbors. Lowering this loss nudges related words into nearby points in a high-dimensional space.

Role:

Measures how well a word predicts its neighbors in a sentence

Domain:

Probability space over vocabulary indices

Binding:

Links center word $w_t$ to its context words $w_{t+j}$

Variance:

Loss grows when predicted probabilities are low

Geometric:

Minimization pulls word vectors closer in embedding space

Invariant:

Sum over all possible context positions

Limits:

Loss approaches zero as predictions become certain