1 minute read

Meta info.
  • Authors: Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
  • Paper: https://arxiv.org/pdf/2411.19865
  • Affiliation: Google Cloud AI Research, Google DeepMind, UNCChapel Hill
  • Published: November 29, 2024

TL; DR

LLM์ด '์—ญ๋ฐœ์ƒ'์„ ํ•™์Šตํ•˜๋„๋ก ํ›ˆ๋ จํ•˜๋ฉด ์ƒ์‹, ์ˆ˜ํ•™, ๋…ผ๋ฆฌ์  ์ถ”๋ก ๊ฐ™์€ task ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ํฐ ๋„์›€. x10๋งŒํผ์˜ forward training(standard finetuning)๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๋‹ค๊ณ  ์ฃผ์žฅ.

image.png

image.png

image.png

image.png

image.png

image.png

Suggestion

Distillation ๋ฐฉ์‹์œผ๋กœ ์—ญ๋ฐฉํ–ฅ ์ถ”๋ก  ํ•™์Šตํ•˜๋Š” REVTHINK ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์•ˆ

  • data augmentation: ๊ต์‚ฌ ๋ชจ๋ธ์˜ fs prompting์„ ํ†ตํ•ด (1) forward reasoning(CoT), (2) backward ์งˆ๋ฌธ, (3) backward reasoning(CoT)์„ ์ƒ์„ฑํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
  • training objective: ์ •ํ™•ํ•œ forward reasoning (vanilla knowledge distillation) + backward ์งˆ๋ฌธ ์ƒ์„ฑ + ์•ž์„œ ์ƒ์„ฑํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ •ํ™•ํ•œ backward reasoning
  • test time inference: ํ•™์ƒ ๋ชจ๋ธ์€ forward reasoning๋งŒ ์ˆ˜ํ–‰.

Effects

  • ํ•™์ƒ ๋ชจ๋ธ์˜ zs ์„ฑ๋Šฅ๋ณด๋‹ค ํ‰๊ท  13.53% ํ–ฅ์ƒ, standard KD(forward inference only) ๋ณด๋‹ค 6.84% ํ–ฅ์ƒ
  • forward reasoning ๋ฐ์ดํ„ฐ์…‹ 10%๋งŒ ์‚ฌ์šฉ, 10๋ฐฐ ๋” ๋งŽ์€ forward reasoning์œผ๋กœ vanilla finetuning ๋ฐฉ๋ฒ•๋ณด๋‹ค ์„ฑ๋Šฅ ์ข‹์•˜์Œ

Personal note. ๊น”๋”ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ํšจ๊ณผ์ ์ธ ๊ฒฐ๊ณผ. ์—ญ๋ฐœ์ƒ ์˜ˆ์‹œ๋ฅผ ๋“ค๋ฉด โ€œ์‚ฌ๊ณผ 2๊ฐœ๋ž‘ ๋ฐฐ 3๊ฐœ๊ฐ€ ์žˆ๋‹ค๋ฉด ๊ณผ์ผ ์ด ๋ช‡ ๊ฐœ ์žˆ๋‚˜์š”?โ€ ๊ฐ™์€ ์‚ฐ์ˆ˜๋ฌธ์ œ์— ๋Œ€ํ•ด forward ๋Š” 2+3=5 ๊ฐ™์€ ๊ตฌ์กฐ๋ผ๋ฉด backward๋Š” โ€œ5๊ฐœ ๊ณผ์ผ์—์„œ ๋ฐฐ 3๊ฐœ ์žˆ๋‹ค๋ฉด ์‚ฌ๊ณผ๋Š” ๋ช‡ ๊ฐœ ์žˆ๋‚˜์š”?โ€ ๊ฐ™์ด ์ถ”๋ก ์‹œํ‚ค๋Š” ๊ฒฝ์šฐ. ๋งŒ์•ฝ forward reasoning์—์„œ 5๊ฐ€ ์•„๋‹ˆ๋ผ 6์œผ๋กœ ์ž˜๋ชป ์ถ”๋ก ๋๋‹ค๋ฉด bacakward reasoning์—์„œ ์ˆ˜์ •๋  ์—ฌ์ง€..