∫
e=mc²
∑
√x
π
Δ
∞
α
Back
⌘K

L=−∑t=1T∑−m≤j≤m,j≠0log⁡P(wt+j∣wt)\mathcal{L} = -\sum_{t=1}^{T}\sum_{-m\le j\le m, j\ne 0}\log P(w_{t+j}|w_t)L=−∑t=1T​∑−m≤j≤m,j=0​logP(wt+j​∣wt​)

P(wt+j∣wt)P(w_{t+j}|w_t)P(wt+j​∣wt​)

,

log⁡P(wt+j∣wt)\log P(w_{t+j}|w_t)logP(wt+j​∣wt​)

,

∑t=1T∑−m≤j≤m,j≠0\sum_{t=1}^{T}\sum_{-m\le j\le m, j\ne 0}∑t=1T​∑−m≤j≤m,j=0​

Skip-Gram Loss

Click on formula components below to explore their properties

Full Formula Properties

Category: Machine Learning

👶

Baby Fast Definition

It scores how well a word guesses its friends in a sentence.

Skip-Gram trains word embeddings by treating each word as a predictor of its neighbors. Lowering this loss nudges related words into nearby points in a high-dimensional space.

Role:

Measures how well a word predicts its neighbors in a sentence

Domain:

Probability space over vocabulary indices

Binding:

Links center word wtw_twt​ to its context words wt+jw_{t+j}wt+j​

Variance:

Loss grows when predicted probabilities are low

Geometric:

Minimization pulls word vectors closer in embedding space

Invariant:

Sum over all possible context positions

Limits:

Loss approaches zero as predictions become certain

Notion2Pi © 2026 — By MYH