[3D Knowledge] Attention vs Linear Attention

Recent Posts

Link

kalelPark's GitHub

« 2026/03 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Today

Total

관리 메뉴

KalelPark's LAB

[3D Knowledge] Attention vs Linear Attention 본문

Advanced 3D vision/3D Knowledge

[3D Knowledge] Attention vs Linear Attention

kalelpark 2025. 12. 18. 18:17

SoftMax Attnetion
- 보통 Attention이라 부름. ( 둘이 헷갈리면 안됨 )
- 시퀀스 길이 N에 대해서, 과거 정보를 압축하지 않고 전부 저장
- Query마다 선택적으로 접근하여 해결

Linear Attnetion
- Attention을 선형 시간으로 근사/변형한 것
- 빠르나, Long Sequence에 약함

코드는 다음과 같다

import torch
import torch.nn.functional as F

def softmax_attention(Q, K, V):
    """
    Q: (B, N, D)
    K: (B, N, D)
    V: (B, N, Dv)
    """
    D = Q.shape[-1]
    scores = torch.matmul(Q, K.transpose(-1, -2)) / (D ** 0.5)  # (B, N, N)
    attn = F.softmax(scores, dim=-1)                            # (B, N, N)
    out = torch.matmul(attn, V)                                 # (B, N, Dv)
    return out
    
    
def linear_attention(Q, K, V, eps=1e-6):
    """
    Q: (B, N, D)
    K: (B, N, D)
    V: (B, N, Dv)
    """
    # feature map (예: elu + 1)
    phi = lambda x: torch.elu(x) + 1

    Q_phi = phi(Q)  # (B, N, D)
    K_phi = phi(K)  # (B, N, D)

    # 🔥 과거를 먼저 고정 크기로 압축
    KV = torch.matmul(K_phi.transpose(-1, -2), V)  # (B, D, Dv)

    # query로 읽기
    Z = 1 / (torch.matmul(Q_phi, K_phi.sum(dim=1, keepdim=True).transpose(-1, -2)) + eps)
    out = torch.matmul(Q_phi, KV) * Z               # (B, N, Dv)

    return out

저작자표시 비영리 (새창열림)

'Advanced 3D vision > 3D Knowledge' 카테고리의 다른 글

[3D Knowledge] KV cache (0)	2025.12.08
[3D Knowledge] SLAM 04 (0)	2025.11.25
[3D Knowledge] SLAM 03 (0)	2025.11.21
[3D Knowledge] SLAM 02 (0)	2025.11.20
[3D Knowledge] SLAM 01 (0)	2025.11.19

'Advanced 3D vision/3D Knowledge' Related Articles

Comments

KalelPark's LAB

[3D Knowledge] Attention vs Linear Attention 본문

[3D Knowledge] Attention vs Linear Attention

'Advanced 3D vision > 3D Knowledge' 카테고리의 다른 글

티스토리툴바