less than 1 minute read

Meta info.

TL; DR

10๋ฐฐ ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ(13B)๋กœ GPT-3 175B ๋Œ€๋น„ ๊ฑฐ์˜ ๋ชจ๋“  ๋ฒค์น˜๋งˆํฌ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ.

Untitled

Untitled

Untitled

Untitled

Suggestions

  • 1T tokens ํ•™์Šต(pic 3์—์„œ 1T ํ† ํฐ ์ด์ƒ์œผ๋กœ๋„ 7B ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„ )
  • Pre-normalization(GPT-3), SwiGLU ํ™œ์„ฑํ•จ์ˆ˜(PaLM), Rotary Embeddings(GPT-Neo) ๋ฐฉ์‹์— ๋ฐ”ํƒ•
  • ์นœ์น ๋ผ๋‚˜ PaLM, GPT-3์˜ ๋ถˆํˆฌ๋ช…์„ฑ ๋Œ€๋น„ LLaMA๋Š” Open-source ๋ฐ์ดํ„ฐ(CC, C4, Wikipedia ๋“ฑ) ๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๊ณต๊ฐœ
  • ๋‹ค๋งŒ ์ƒ์—…์šฉ ๋ฐ ์ƒ์‚ฐ ๋ชฉ์ (?)์œผ๋กœ๋Š” ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ ๋ถˆ๊ฐ€
  • instruction finetuning๋„ ์‹œ๋„ํ–ˆ๋‹ค๊ณ .(pic 4)