less than 1 minute read

Meta info.

TL; DR

LM์ด Self-Talk๋ฅผ ํ†ตํ•ด training ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑ>์ •์ œ>SFT์— ํ™œ์šฉ (bootstrapping). ์ด ๊ณผ์ •์—์„œ ๋ณ‘๋ชฉ์„ ํ•ด์†Œํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€ํ™”์„ฑ๊ณต ์—ฌ๋ถ€๋ฅผ ์ธก์ •ํ•˜๋Š” automatic metric ์ œ์•ˆ

Untitled

Suggestions

  • bootstrapping: ์ž์ฒด ์ถœ๋ ฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•˜๊ฑฐ๋‚˜ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•. self-talk loop์˜ ๋ณ‘๋ชฉ์œผ๋กœ (์ดˆ๊ธฐ)๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ์ €ํ•˜ ๋ฌธ์ œ๋‚˜ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค ํ•œ๊ณ„, ๋ชจ๋ธ ์˜ค๋ฅ˜ ์ „ํŒŒ ๋“ฑ์„ ์ง€์ .
  • subgoal completion์„ ์ •๋Ÿ‰ํ™”ํ•œ๊ฒŒ ๋Œ€ํ™” evaluation์˜ ํ•ต์‹ฌ, ์ฃผ๋กœ ROUGE-L ์‚ฌ์šฉ.
  • ์ •์˜ํ•œ โ€œ๋Œ€ํ™”์˜ workflow stepโ€์ด๋ž‘ ๋น„๊ตํ•ด์„œ ๋Œ€ํ™” ํ’ˆ์งˆ์„ ํ‰๊ฐ€ํ•˜๋Š”๋ฐ, ๊ฒฐ๊ณผ์ ์œผ๋กœ workflow step completion์ด 5๊ฐœ ์ด์ƒ์ด๊ฑฐ๋‚˜ ์™„๋ฃŒํ•œ ๋น„์œจ ์ƒ์œ„ 5%๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„๋‹ค๊ณ  ํ™•์ธ
    • ๊ธฐ์ค€์„ ์—„๊ฒฉํžˆ ๊ฐ€์ ธ๊ฐˆ์ˆ˜๋ก ์•„๋งˆ FT ๋ฐ์ดํ„ฐ์…‹์ด ๋ชจ์ž๋ผ์„œ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๊ฒƒ์œผ๋กœ ์ถ”์ธก

Personal note. ํ™•์‹คํžˆ ๋ชจ๋ธ์˜ ๋Œ€ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š”๊ฒƒ์€ ์•„์ง๊นŒ์ง€ SFT์ด ๊ฑฐ์˜ ์œ ์ผํ•œ๋“ฏ ํ•˜๊ณ , ์ด๋งˆ์ €๋„ ๊ฐœ์„ ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐ์ดํ„ฐ ์‚ฌ์ด์ฆˆ๊ฐ€ ํ•ต์‹ฌ์ด๋ผ๋Š” ๊ฒฝํ–ฅ ํ™•์ธ