⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

一个一个字输出

attention is all you need

嵌入层:将输入转换为向量

N个这样的层,可以认为是transformer块

layerNormalize

batchNomalize

相对来说更有效

注意力机制

根据输入的query不同,v,和k的加权权重会不同

注意:q,v,k都是向量

矩阵向量点乘

矩阵向量点乘

除于dk,在进行softmax

mask:将t时刻后面的key全变为负大数

多头注意力机制

具体公式实现

假设h=8,就有8次机会改进投影时的参数

投影的维度+输出的维度/h,也就是vkq在分别投影时,需除以h

q

k

v

其实这里的qkv都是来自一个刚开始的输入 但经过多头注意力后,qkv会线性变换为不同的向量

如果vk与query越接近,其权重就会越大

transformer与RNN对比

attention

通过positional Econding将时序信息与Embeding层相加以达到加入输入信息的目的

Embedded Files

5cd223e4af894477844f1c3f065a07a7dac93220: transformer.png

7b5d161284aebe516758ba23d5bba865dbaea7e2: Pasted Image 20250513151345_661.png

e5f1a2d1308a18c579276935b46406c438e20c47: Pasted Image 20250513152621_763.png

d09847b3c146e7efea6d61cb310c3a324574eebb: Pasted Image 20250513154355_624.png

9cd6fa5424f377c7a4c4e02fba732d2f87ef49a6: Pasted Image 20250513155439_137.png

cc7881d2b9b5423f677e7c9605830116cbd5eef6: Pasted Image 20250513155851_709.png

6004674d714e466f164703ff19c4347fb7949c15: Pasted Image 20250513160339_539.png

59b12b0a8350653ee0b7984bc7eecb462c21d345: Pasted Image 20250513160604_621.png

92076ffba4bccd57182b1a9a2ae4e570b28e0d49: Pasted Image 20250513162226_606.png

8849c336eb5eea9b704a3e9fd76bbfa2e99c186c: Pasted Image 20250513164747_187.png