Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

June 14, 2024 less than 1 minute read

Meta info.

Authors: Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michał Podstawski, Torsten Hoefler
Paper: https://arxiv.org/pdf/2406.05085
Affiliation: Cledar, ETH Zurich
Published: June 7, 2024
Code: https://github.com/spcl/MRAG

TL; DR

multi-head attention layer를 활용, 직관적인 multi-doc RAG 및 knowledge integration를 위한 retriever 연구

Untitled

decoder의 multi-head attention layer 활용하여 multi-aspect retrieve (pic2)
- 서로 다른 head가 다른 부분에 attend 한다는 intuition 기반
- 모든 attention head의 출력이 섞이기 전, 마지막 token의 각 head별 임베딩 활용
embedding: SRF-embedding-model / e5-mistral-7b-instruct 등 활용, 마지막 token의 모든 attention-head output으로 계산
importance score: $a_i \cdot b_i$ (pic3)
- $a_i$: (모델이 얼마나 attend하느냐에 따라) 해당 head에 있는 모든 임베딩의 L2 norm 누적 → 일종의 weight 역할
- $b_i$: 해당 head의 cosine distances 평균 → 퍼진 정도
voting: 앞선 스코어 기반 sorting해서 가장 높은 값 선택 (pic4)

- **RAGAS** 통합 가능

Personal note. mulit-doc RAG 관련해서 직관적인 아이디어에 대한 초기실험인 듯.chunk 단위를 일관되게 한 부분이나 제안하는 아이디어의 aspect의 개수도 head 개수인듯 하여 개선의 여지가 보입니다.