less than 1 minute read

Meta info.

TL; DR

์ƒˆ๋กœ ์ถ”๊ฐ€ํ•œ ๋ธ”๋ก์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋งŒ ๋„๋ฉ”์ธ ๋ฐ์ดํ„ฐ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋Š” post-pretraining ๋ฐฉ์‹์˜ block expansion์ด domain-specific task์— ํŠนํžˆ ์œ ์šฉํ•˜๋‹ค๊ณ  ์ œ์•ˆ. ์ „์ฒด๋ฅผ finetuningํ•  ๋•Œ ๋ฐœ์ƒ๋˜๋Š” ๋ง๊ฐ์ด ์ผ์–ด๋‚˜์ง€ ์•Š๋Š”๋‹ค๊ณ . ๋™์ผ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ์„ ์ „์ œํ–ˆ์„ ๋•Œ LLaMA2 ๋ณด๋‹ค ์ผ๊ด€๋˜๊ฒŒ outperform, ํŠนํžˆ expansion ๊ณผ์ •์—์„œ ๋” ๋งŽ์€ ์ง€์‹์„ ์Šต๋“ํ•œ๋‹ค๊ณ . ๋‹ค๋งŒ ์ง€์ ๋œ ๋ฐ”์™€ ๊ฐ™์ด ๋ธ”๋ก ์ˆ˜๋ฅผ task ๋ณ„๋กœ ์กฐ์ •ํ•ด์•ผํ•˜๋Š” ๊ฒƒ์€ ํ•œ๊ณ„.

Untitled

Untitled

Untitled

LLaMA Pro: Progressive LLaMA with Block Expansion

Suggestions

  • P=1, M=4, N=3์œผ๋กœ ์‹คํ—˜(pic2 ๋ณ€์ˆ˜ ์ฐธ๊ณ ), LLaMA2-7B ๋ชจ๋ธ์˜ 32๋ธ”๋ก์—์„œ 40๊ฐœ๋กœ ํ™•์žฅ, ์ฆ‰ ๊ฐ ๊ทธ๋ฃน(?)์ด 4๋ธ”๋ก์—์„œ 5๋ธ”๋ก์œผ๋กœ ํ™•์žฅ.
  • identity block: ์›๋ž˜ ๋ชจ๋ธ์˜ block copy & insertํ•˜๋Š” ๊ณผ์ •์—์„œ ํ™•์žฅ๋œ block. ๋ธ”๋ก ๋‚ด multi-head self-attention & FFN๊ฐ€ input์˜ RMSNorm์„ ์ทจํ•  ๋•Œ 0์„ ์ถœ๋ ฅํ•˜๋„๋ก initialize(by-pass)
  • code task ์ฒ˜๋Ÿผ ์ „๋ฌธ ์šฉ์–ด ํ˜น์€ ํŠน์ˆ˜ domain์— ์ ์šฉ์‹œ ์œ ๋ฆฌ (pic1)

Personal note. ์—…์Šคํ…Œ์ด์ง€ ๋ชจ๋ธ๊ณผ ํฐ ๋งฅ๋ฝ์€ ๋น„์Šทํ•œ ๊ฒƒ ๊ฐ™์€๋ฐ (๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ํšจ๊ณผ์ ์œผ๋กœ ์ž˜ ๋Š˜๋ฆด์ง€?)