less than 1 minute read

Meta info.

TL; DR

์ด์ „ ๊ณต๊ฐœํ–ˆ๋˜ ๋ชจ๋ธ(Chat QA 1.5)์„ LLaMA3-70B์˜ context length ํ™•์žฅํ•˜๋ฉด์„œ instruction following / RAG capability ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ• ์ œ์‹œ

Untitled

Untitled

Untitled

Suggestions

  • Llama3-70B input length 8K์—์„œ 128K๋กœ ํ™•์žฅ
    • SlimPajama๋กœ continual pretraining
    • BOS, EOS์— special token ํ™œ์šฉ์ด ๋” ํšจ๊ณผ์ : Llama3์˜ ์™€ ํ† ํฐ์ด ์‚ฌ์ „ ํ•™์Šต ํ›„ ๋ชจ๋ธ์— ์ด์ „ ํ…์ŠคํŠธ ์ฒญํฌ๋ฅผ ๋ฌด์‹œํ•˜๋ผ๋Š” ์‹ ํ˜ธ๋ฅผ ๋ณด๋‚ด๊ธฐ ๋•Œ๋ฌธ, long context์— ๋น„ํšจ์œจ
  • RAG + long context ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ instruction tuning
    • 1๋‹จ๊ณ„ 128K ๊ธฐ์ค€ Instruction tuning
    • 2๋‹จ๊ณ„ context + ๋Œ€ํ™”ํ˜• QA ํ˜ผํ•ฉ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต (์ตœ๋Œ€ 4K input)
    • 3๋‹จ๊ณ„ 128K SFT ์ˆ˜์ง‘ (?)
  • Long context retriever
    • top-k chunk-wiser retriever ๋Œ€์‹  long-context retriever
      • chunk ๊ธธ์ด๋Š” ๊ธธ์ˆ˜๋ก ์ข‹์•˜๊ณ , ๋น„์šฉ ์ธก๋ฉด์—์„œ ์ด ์‚ฌ์šฉ ํ† ํฐ ๊ธฐ์ค€์œผ๋กœ chunk size 1200 + top-5 retrieval ์ „๋žต ํ™œ์šฉ
      • encoder: E5-mistral

Effect

GPT-4-Turbo2024-0409์™€ ๋น„์Šทํ•œ ์ •ํ™•๋„

Personal note. technical report์— ๊ฐ€๊น๊ธฐ๋„ ํ•˜๊ณ  ์ƒˆ๋กœ์šด ์ธ์‚ฌ์ดํŠธ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€๋ถ„๋„ ํฌ์ง„ ์•Š๊ธด ํ•˜์ง€๋งŒ ๊ณ„์† ์ž‘์€ ๋ชจ๋ธ๋กœ ๋Œ€ํ™”๊ผด capacity ํ–ฅ์ƒ์‹œํ‚ค๋ ค๋Š” ๋…ธ๋ ฅ์„ ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ ๊ฐ™์•„์„œ following ํ•ด๋ด…๋‹ˆ๋‹ค.