1 minute read

Meta info.

TL; DR

(1) ์—ฌ๋Ÿฌ ๊ธธ์ด์˜ interval (2) ๋‹ค์–‘ํ•œ depth range๋ฅผ ๊ฐ€์ง„ (3) ์ ์ง„์ ์œผ๋กœ ์–ด๋ ค์›Œ์ง€๋Š” (4) 2 ์–ธ์–ด(์˜๋ฌธ/์ค‘๋ฌธ)์˜ long context ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋Š” NeedleBench ์ œ์•ˆ ๋ฐ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๋กœ ํ‰๊ฐ€ ๊ฒฐ๊ณผ ๋ฆฌํฌํŠธ

Untitled

Untitled

Untitled

Untitled

Suggestion

  • Task Details:
    • Single-Needleย ReTrieval Task: LLM์ดย ๋‹จ์ผย ํ•ต์‹ฌ ์ •๋ณด ๊ธฐ์–ตํ•˜๋Š”์ง€
      • ๊ธด ํ…์ŠคํŠธ์˜ ๋‹ค์–‘ํ•œ ์œ„์น˜์— ์ •๋ณด ์‚ฝ์ž…ํ•˜๊ณ  ์ด๋Ÿฌํ•œ ๊ด‘๋ฒ”์œ„์—์„œ ์ œ๋Œ€๋กœ needle ์ฐพ๋Š”์ง€ ํ™•์ธ
    • Multi-Needleย ReTrieval Task: LLM์ดย ์—ฌ๋Ÿฌย ๊ด€๋ จ ์ •๋ณด ๊ฒ€์ƒ‰ ์ž˜ ํ•˜๋Š”์ง€
      • ํฌ๊ด„์ ์ธ ๋ฌธ์„œ์—์„œ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ extract ์ž˜ ํ•˜๋Š”์ง€ ํ™•์ธ
    • Multi-Needleย ReaSoning Task: LLM์ด ์ •๋ณด ์ถ”์ถœ ํ›„ย ์ถ”๋ก ๊นŒ์ง€ย ์ž˜ ํ•˜๋Š”์ง€
      • ์ถ”์ถœํ•œ ์ •๋ณด๋กœ ๋‹ค์–‘ํ•œ text ๋ถ€๋ถ„๋“ค์— ๋Œ€ํ•œ ์ดํ•ด + ์ถ”๋ก ์ด ๋ณตํ•ฉ์ ์œผ๋กœ ํ•„์š”ํ•œ ๋‹ต๋ณ€ ํ‰๊ฐ€
  • Dataset Construction:
    • needle design
      • ์ถ”๋ก  ๋‹จ๊ณ„ 1~5๋‹จ๊ณ„ ์ด์ƒ ๋“ฑ์œผ๋กœ ๊ตฌ์„ฑ, ๋Œ€๋ถ€๋ถ„ 1~2๋‹จ๊ณ„
      • abstract/nonexistent needle: ๋ชจ๋ธ ๋‚ด๋ถ€ ์ง€์‹์ด ์ •๋ณด ๊ฒ€์ƒ‰ ๋Šฅ๋ ฅ์„ ์ €ํ•ดํ•˜์ง€ ์•Š๋„๋ก, ์ถ”์ƒ์ ์ด๊ฑฐ๋‚˜ ์„ธ์ƒ์— ์—†๋Š” ์ •๋ณด๋กœ needle ๊ตฌ์„ฑ
      • M-RT์˜ ๊ฒฝ์šฐ HotpotQA์„ ๊ฐœ์„ ํ•œR^{4}C ๋ฐ์ดํ„ฐ์…‹ ํ™œ์šฉํ•˜์—ฌ ๊ตฌ์ถ•
        • ๋Œ€๋ช…์‚ฌ ๋“ฑ ์—†๋„๋ก ์ฒ˜๋ฆฌ.
        • ์ค‘๊ตญ์–ด ๋ฒˆ์—ญ
    • haystack design
      • PaulGrahamEssays ํ™œ์šฉ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ชฉํ‘œ ๊ธธ์ด๊นŒ์ง€ ํ™•์žฅ
      • 32K~200K์˜ context length ๊ตฌ์„ฑ

Effects

  • Results
    • metric: Recall Acc, sequential averaging, overall score
    • ๊ธธ์ด ๋ณ€ํ™” ์˜ํ–ฅ:
      • (32K)ย InternLM2-7B-200K: S-RT์—์„œ ์šฐ์ˆ˜ํ•œ๋ฐ ๋ฐ˜ํ•ด M-RT์—์„œ ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ์ €ํ•˜
      • (200K)ย Orion-14B-LongChat: M-RT๋Š” ์ž˜ ํ•˜๋Š”๋ฐ , S-RT, ํŠนํžˆ context ๊ธธ์ด 80K ํ† ํฐ ์ด์ƒ์—์„œ ํ•œ๊ณ„
      • (1000K)ย GLM4-9B-Chat-1Mย ยซย InternLM2-7B-200K
    • ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ ์˜ํ–ฅ: ๋น„๋ก€ํ•ด์„œ average score ์ƒ์Šนํ•˜์ง€๋งŒ, 8K ์ด๋‚ด๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ ์‚ฌ์ด์ฆˆ ์˜ํ–ฅ์€ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์€๋“ฏ
    • needle ์œ„์น˜ ์˜ํ–ฅ: ์•ž๋ถ€๋ถ„์— ์žˆ์„์ˆ˜๋ก ๋ถˆ๋ฆฌ. (InternLM2-7B-200Kย M-RT์—์„œ ์˜ˆ์™ธ)
    • ATC ์„ฑ๋Šฅ: reasoning path์ถ”๊ฐ€ํ•˜๋ฉด ATC ์„ฑ๋Šฅ ํฌ๊ฒŒ ํ–ฅ์ƒ (Claude-3-Opus ๋“ฑ)
      • ATC: long context์— ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ๋…ผ๋ฆฌ์  ์ถ”๋ก  ๊ณผ์ œ์˜ ๋ณต์žก์„ฑ์„ ๋ชจ๋ฐฉํ•œ task๋กœ, multi-step reasoning ํ‰๊ฐ€ (์—ฐ์‡„์ ์ธ ๋…ผ๋ฆฌ๋ฅผ ๊ตฌ์ถ•ํ•œ๋‹ค๋˜๊ฐ€, ์ŠคํŠธ๋ ˆ์Šค ํ…Œ์ŠคํŠธ๋ฅผ ํ•œ๋‹ค๋˜๊ฐ€ ๋ฉ€ํ‹ฐ ์ดˆ์ด์Šค๋ฅผ ์‹œํ‚ค๋Š” ๋“ฑ)