2 minute read

Meta info.

TL; DR

LC-LLM์„ RAG์—์„œ ์“ธ ๋•Œ, (1) context ์ˆœ์„œ๋ฅผ ์ž˜ ์ฃผ๊ณ  (2) RAG ๋А๋‚Œ์„ ํŠœ๋‹์‹œ์ผœ์ฃผ๊ณ  (3) ๋ช…์‹œ์ ์œผ๋กœ relevant ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋„๋ก reasoning step ์ฃผ๋ฉด ๋” ์ž˜ํ•œ๋‹ค.

image.png

image.png

image.png

image.png

image.png

image.png

Problem States

LC-LLM์ด RAG system์—์„œ retrieved context ๊ฐœ์ˆ˜๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ๋งŽ์•„์ง€๋งŒ ์ƒ์„ฑ ์„ฑ๋Šฅ ํ•˜๋ฝํ•˜๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ

  • Research Question:
    1. RAG์—์„œ LC-LLM ์‚ฌ์šฉํ•  ๋•Œ retrieved context ์–‘์ด ๋งŽ์„์ˆ˜๋ก ์ผ๊ด€๋˜๊ฒŒ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜๋Š”๊ฐ€?ย > ๊ทธ๊ฑด ์•„๋‹˜
    2. (RQ1์—์„œ ๊ด€์ฐฐ๋œ) performance bottleneck์ด retriver์˜ ํ•œ๊ณ„์ธ๊ฐ€, ์•„๋‹ˆ๋ฉด (๊ฒ€์ƒ‰๋œ ์ปจํ…์ŠคํŠธ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š”) LC-LLM์˜ ๋Šฅ๋ ฅ์— ํ•œ๊ณ„ ๋•Œ๋ฌธ์ธ๊ฐ€?ย > ์•„๋งˆ LLM์˜ ํ•œ๊ณ„
    3. (ํ•ด๋‹น LC-LLM์˜ ํ•œ๊ณ„๋ฅผ ๊ฐœ์„ ์‹œํ‚ค๋ ค๋ฉด) ์ผ๋ฐ˜์ ์œผ๋กœ RAG system์—์„œ๋Š” high recall์ด ๊ธฐ๋ณธ = hard negative ํฌํ•จ ๊ฐ€๋Šฅ์„ฑ ์ฆ๊ฐ€๋˜๋Š” ๊ฒƒ ๋•Œ๋ฌธ์ผ๊ฒƒ ๊ฐ™๋‹ค. 1) (์ด ๊ฐ€์ •์ด ๋งž๋Š”๊ฐ€?) ํ˜„์žฌ์˜ LC-LLM์ด ์ด๋Ÿฌํ•œ Hard Negative์— ์–ผ๋งˆ๋‚˜ robustํ•œ๊ฐ€?ย > ์ทจ์•ฝํ•˜๋‹ค 2) (๋งž๋‹ค๋ฉด) Hard Negative์˜ ์˜ํ–ฅ์€ ์‚ฌ์šฉ๋˜๋Š” retriever์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š”๊ฐ€?ย > ๊ทธ๋ ‡๋‹ค

Suggestions

  • Observation:
    • (RQ1) RAG์—์„œ LC-LLM ์‚ฌ์šฉํ•  ๋•Œ retrieved context ์–‘์ด ๋งŽ๋‹ค๊ณ  ์„ฑ๋Šฅ์ด ํ•ญ์ƒ ์ข‹์•„์ง€๋Š”๊ฑด ์•„๋‹ˆ๋ฏ€๋กœ, ๊ทธ ์™ธ์˜ ์š”์†Œ๋ฅผ ๊ณ ๋ฏผํ•ด๋ด์•ผ๋œ๋‹ค.
      • Figure 1: NQ๋กœ ํ™•์ธํ•œ ๊ฒฐ๊ณผ, strong retriever๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ RAG ์„ฑ๋Šฅ์ด concaveํ•œ ํ˜•ํƒœ๋ฅผ ๋„์ง€๋งŒ, weak retriever๋ฅผ ์“ฐ๋ฉด ์šฐ์ƒํ–ฅํ•˜๊ฑฐ๋‚˜ ์•ฝ๊ฐ„๋งŒ ๊ฐ์†Œ
    • (RQ2) performance bottleneck์€ LC-LLM์˜ ํ•œ๊ณ„๋‹ค.
      • Figure 2: RAG์˜ ์ „๋ฐ˜์ ์ธ accuracy๊ฐ€ ๋ชจ๋“  retrieved context ์–‘์— ๋Œ€ํ•ด recall๋ณด๋‹ค ๋‚ฎ๋‹ค๋Š” ์ ์—์„œ ๋ฏธ๋ฃจ์–ด, ์ •๋‹ต์„ ์ค˜๋„ LC-LLM์ด ๋ชป๋ฐ›๋Š”๋‹ค๊ณ  ๋ณด๋Š” ๊ฒƒ์ด ๋งž๋‹ค๋Š” ํ•ด์„
        • ์ฆ‰, irrelevant context (hard negative)๊ฐ€ ํฌ๋ฆฌํ‹ฐ์ปฌํ•  ์ˆ˜ ์žˆ๋‹ค.
      • retriever๋กœ e5์“ฐ๋Š” ๊ฒฝ์šฐ, retrieved context๊ฐ€ ๋งŽ์„์ˆ˜๋ก BM25์— ๋น„ํ•ด ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ์ปธ๋‹ค๊ณ  .
    • (RQ3) Hard negative์˜ ์ค‘์š”์„ฑ
      • Figure 3: ๋ชจ๋“  LLM์—์„œ hard negative context๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด ์ผ๋ฐ˜์ ์œผ๋กœ RAG ์„ฑ๋Šฅ ๊ฐ์†Œ
        • LLMs: Gemma2-7B-Chat, Mistral-Nemo-12B-Instruct, Gemini-1.5-Pro
        • hard negative context ๊ตฌ์„ฑ: gold phrase(์ •๋‹ต ๊ตฌ์ ˆ) + hard negative retrieved context (e5, Contriever, BM25, random sampling)
      • retriever์˜ ์„ฑ๋Šฅ์ด hard negative ๋‚œ์ด๋„์™€ ์ง์ ‘์ ์ธ ์ƒ๊ด€์„ฑ
        • LLM์€ weak retriever(BM25 or random sampling)์˜ context ๋ณด๋‹ค strong retriever (e5)์˜ hard negative context์— ๋” challenge (๋‹น์—ฐํ•˜๊ธด ํ•œ๋ฐ, ์น˜๋ช…์„ฑ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์‹ถ์—ˆ๋˜ ๋“ฏ)
  • Methods:
    1. lost-in-the-middle ํ•ด์†Œ๋ฅผ ์œ„ํ•œย Reranking: [Instruction, rank_1, rank_3, โ€ฆ rank_4, rank_2] ๋“ฑ์œผ๋กœ ๋ฐฐ์น˜
    2. fine-tuning for implicit robustness:ย noisyํ•œ retrieved context๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์€ pretraining ๋‹จ๊ณ„์—์„œ ์•ˆ๋ฐฐ์šฐ๋ฏ€๋กœ,ย finetuningย ํ•ด์•ผ๋œ๋‹ค. (hard-negative์— ๋Œ€ํ•œ robustness)
    3. fine-tuning for explicit robustness:ย LLM์ด ๋ช…์‹œ์ ์œผ๋กœ relevant ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋„๋กย intermediate reasoning์„ ์ถ”๊ฐ€์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•ด์•ผํ•œ๋‹ค. (์—ญ์‹œ ํŠœ๋‹)

Effects

  • (Suggestion 1): reranking์€ retrieved context๊ฐ€ ๋งŽ์„์ˆ˜๋ก ์œ ์ตย Figure 4
    • Gemma-2-9B-Chat & Mistral-Nemo-12B-Instruct have tested NQ / PopQA with retrieved context by BM25 or e5
    • lost-in-the-middle ํ•ด์†Œ ๋ฐ Hard negative context์— ๋Œ€ํ•œ ์ „๋žต์  ์ฒ˜๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž„
      • ์ฆ‰ RAG์—์„œ engineering ์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•œ ์—ญ์„ค๊ณผ ๊ฐ™๋‹ค๊ณ  ํ•ด์„ ๊ฐ€๋Šฅ
  • (Suggestion 2): implicit robustness๋ฅผ ์œ„ํ•œ Finetuning์˜ ์œ ํšจ์„ฑย Figure 5
    • NQ, WoW, Fever, MMLU๋“ฑ์œผ๋กœ RAG style tuningํ•˜๊ณ , ๊ทธ ๋•Œ ์•ˆ๋ณธ QA set์œผ๋กœ ํ‰๊ฐ€ํ–ˆ์„ ๋•Œ, ํฐ ํญ์œผ๋กœ ์„ฑ๋Šฅ ๊ฐœ์„  ํ™•์ธ
      • ํ•ด๋‹น QA set์œผ๋กœ ์ง์ ‘ํŠœ๋‹ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ผ๊ด€๋˜๊ฒŒ ๋” ๋‚˜์€ ํšจ๊ณผ
  • (Suggestion 3): ๋ช…์‹œ์ ์œผ๋กœ relevant ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จ์‹œํ‚ค๋Š” ๊ฒƒ์ด ์ตœ์ข… ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์œ ์ตย Figure 6

Personal note. RAG๋…ผ๋ฌธ์ด Google Research ์—์„œ ์•ˆ๋‚˜์˜ค๊ณ  Cloud์—์„œ ๋‚˜์˜ค๋Š” ๊ฒฝํ–ฅ์ด, ๋”์šฑ ์—”์ง€๋‹ˆ์–ด๋ง์— ๊ฐ€๊นŒ์›Œ์กŒ๋‹ค๋Š” ์˜๋ฏธ๊ฐ€ ์•„๋‹์ง€โ€ฆ?