less than 1 minute read

Meta info.

TL; DR

์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ž‘์€ ๋ชจ๋ธ์„ Blendํ•ด์„œ ํ•˜๋‚˜์˜ ํฐ ๋ชจ๋ธ๊ณผ ๋น„์Šทํ•œ ํ˜น์€ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

Untitled 3

Untitled 4

Untitled

Untitled

Untitled

Suggestions

  • blending: ์—ฌ๋Ÿฌ ์‹œ์Šคํ…œ ์ค‘ ํ™•๋ฅ ์ ์œผ๋กœ ํ•œ ์‹œ์Šคํ…œ์ด ๋‹ต๋ณ€ ์ƒ์„ฑ์„ ๋‹ด๋‹น
  • ensembling:ย Bayesian statistics ์›์น™์— ๋”ฐ๋ผ ChatAI๊ฐ€ ํŠน์ • ์‘๋‹ต์— ํ• ๋‹นํ•˜๋Š” ํ™•๋ฅ ์„ marginal expectation๋กœ ๊ฐœ๋…ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ . ์—ฌ๋Ÿฌ ChatAI ์‹œ์Šคํ…œ์ด ๊ฒฐํ•ฉ๋œ ๊ฒฝ์šฐ, ์ „์ฒด ์‹œ์Šคํ…œ์ด ๊ฐœ๋ณ„ ์‹œ์Šคํ…œ์˜ ๊ฒฐํ•ฉ ํ™•๋ฅ ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์‘๋‹ต์„ ์ถ”์ •ํ•˜์—ฌ ์ „์ฒด์ ์ธ ์‘๋‹ต ์„ฑ๋Šฅ ํ–ฅ์ƒํ•  ์ˆ˜ ์žˆ์Œ. (pic1, 2)
  • โ€œIntegrating just 3 models of moderate size (6B/13B) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+)โ€ (pic3)