1 minute read

Meta info.

TL; DR

๊ธฐ์กด์— ๊ฐ๊ธฐ ๋‹ค๋ฅธ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋ฉด์„œ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต๋œ ์—ฌ๋Ÿฌ LLMs(soucre LLMs)์„ ๋ณ‘ํ•ฉํ•ด์„œ ๋” strongํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•(pic1)์œผ๋กœ, ์—ฌ๋Ÿฌ LLM์˜ ์ง€์‹์„ ์™ธ๋ถ€ํ™”ํ•˜์—ฌ ๊ทธ๋“ค์˜ capability๋ฅผ ์ƒˆ๋กœ์šด LLM(target LLM)์œผ๋กœ transferํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆ(pic2)

Untitled

Untitled

Untitled

Untitled 3

Suggestions

  • source LLMs: ๊ฐ๊ธฐ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ๊ฐœ๋ณ„์ ์œผ๋กœ training/fine-tuning๋˜์–ด ๋‹ค์–‘ํ•œ ๊ฐ•์ ๊ณผ ์ง€์‹ ๊ธฐ๋ฐ˜์„ ๊ฐ€์ง. ๊ฒฐํ•ฉํ•˜๊ธฐ ์ „, ๊ฐ source LLM์— ์ผ๋ถ€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด prediction ์‹œ๋„(๊ฐ LLM์ด ์•Œ๊ณ  ์žˆ๋Š” ๊ฒŒ(๊ฐ•์ ์ด) ๋ฌด์—‡์ธ์ง€ ํ™•์ธ) ํ›„ ๊ฐ ์˜ˆ์ธก ํ‰๊ฐ€, ๊ฐ€์žฅ ์ •ํ™•ํ•œ ์˜ˆ์ธก์œผ๋กœ LLM ํ•™์Šต. (next token prediction, causal language modeling objective==minimizing negative log-likelihood)
  • target LLM: source LLMs๋ฅผ ์œตํ•ฉํ•ด์„œ ๋งŒ๋“œ๋ ค๋Š” LLM. ์ตœ์ข…์ ์œผ๋กœ source ์™€ target ์˜ˆ์ธก๊ฐ„ Divergence๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ด objective.

Effects

Llama-2, MPT, OpenLLaMA ์‚ฌ์šฉ, BBH/CS/ME ์™ธ task์—์„œ ํ™•์ธ(pic3, 4), ์ „๋ฐ˜์ ์œผ๋กœ ์ œ์•ˆํ•œ FuseLLM์ด ํ‰๊ท  ์•ฝ 6.36%์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ.

  • BBH: ๋Œ€์ฒด๋กœ FuseLLM์ด (source ์ค‘ ๊ฐ€์žฅ ๋‚˜์•˜๋˜) Llama-2 ๋Œ€๋น„ 5.16% ์„ฑ๋Šฅ ํ–ฅ์ƒ, ์ผ๋ถ€ Dyck Languages ๋“ฑ์—์„œ์˜ ๋‚ฎ์€ ์„ฑ๋Šฅ์€ ๋‹ค๋ฅธ source LLM์˜ ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์•˜๊ฑฐ๋‚˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์ œ๋ž‘ ๋ฐ€์ ‘ํ•˜๊ฒŒ ๊ด€๋ จ๋˜์–ด ์žˆ์ง€ ์•Š์•˜์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ๋ถ„์„.
  • CS: ์ผ๊ด€๋˜๊ฒŒ ๋” ๋‚˜์€ ์„ฑ๋Šฅ. ARC, OpenBookQA ์ฒ˜๋Ÿผ ์–ด๋ ค์šด task์—์„œ ๋” ํฐ ๊ฐœ์„ .
  • ME (Code Generation): Llama-2๋ณด๋‹ค ์•ž์„  ๊ฒƒ ํ™•์ธ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ์•„์ง ๊ฐœ์„  ์—ฌ์ง€ ์žˆ์Œ.
  • ๋‹ค๋ฅธ task๋“ค์— ๋Œ€ํ•ด์„œ๋Š” appendix ์ฐธ๊ณ .