2 minute read

Meta info.
  • Authors: Yangjun Ruan, Neil Band, Chris J. Maddison, Tatsunori Hashimoto
  • Paper: https://arxiv.org/pdf/2503.18866
  • Affiliation: Stanford Univ., Univ. of Toronto, Vector Institute.
  • Published: March 24, 2025

TL; DR

LLM์— bootstrapping์œผ๋กœ ๊ตฌ์กฐํ™”๋œ internal reasoning representation(์—ฌ๊ธฐ์„œ๋Š” Token)์ธ latent thoughts ์ƒ์„ฑ์„ ํ•™์Šตํ•˜์—ฌ reasoning ability ํ–ฅ์ƒ ๊ฐ€๋Šฅ์„ฑ ์ œ์•ˆ

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

Background

CoT prompting์˜ ์ผ๋ฐ˜ํ™”๋กœ intermediate reasoning step์˜ ์œ ์šฉ์„ฑ ํ™•์ธ

  • ์ •์ž‘ LLM ํ•™์Šตํ•  ๋•Œ ์ค‘๊ฐ„ step์œผ๋กœ ํ•™์Šตํ•˜๋Š”๊ฑด ์•„๋‹ˆ์ง€ ์•Š๋‚˜ (์ตœ์ข… ๋‹ต๋ณ€์œผ๋กœ ํŠœ๋‹)

Problem States

LLMํ•œํ…Œ Reasoning step ์ค‘๊ฐ„์—๋„ supervision์„ ์ฃผ๋ฉด reasoning ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€์ง€ ์•Š์„๊นŒ

Suggestions

BoLT (Bootstrapped latent thought) model ์ œ์•ˆ

  • Latent Thought Sampling > Answer Generation ํ๋ฆ„
  • $Z \sim q(Z|X; M_t)$
    • Z: latent thoughts, CoT๊ฐ™์€ token sequence.
    • $M_t$: t๋ฒˆ์งธ iteration์—์„œ์˜ ๋ชจ๋ธ. EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ํ•™์Šต.
    • objective: ELBO
      • ์ข‹์€ $Z$๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๋ฉด์„œ
      • $q(Z X)$๋ถ„ํฌ๊ฐ€ prior $p(Z X)$์— ๋น„์Šทํ•˜๋„๋ก
  • Expectation step:ย Z๋ฅผ K๊ฐœ ์ƒ˜ํ”Œ๋ง > Z์™€ ํ•จ๊ป˜ย $Y \sim p(Y|X, Z; M_t)$์ƒ์„ฑ
    • Y์— ๋Œ€ํ•ด ์ค‘์š”๋„๊ฐ€ ๋†’๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š” thought Z* ์„ ํƒ
    • (๊ฐย $Z_k$๋งˆ๋‹คย $p(Y X, Z_k)$ย log-likelihood ๊ธฐ๋ฐ˜์œผ๋กœ weight w๋กœ ํ™œ์šฉ)
  • Maximization step: {Z*, X, Y}๋กœ ๋ชจ๋ธ ํ•™์Šตํ•ด์„œ $M_{t+1}$ ์—…๋ฐ์ดํŠธ
    • p๋ฅผ ์œ„ํ•ด , q๋ฅผ ์œ„ํ•ด , Z์˜ ์‹œ์ž‘๊ณผ ๋์„ ์•Œ๋ฆฌ๊ธฐ ์œ„ํ•ด <start/endoflatent> ํ™œ์šฉ
      • $Z$๋ฅผ $X$๋ž‘ ๊ฐ™์ด ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ํฌ๋งทํŒ…์— ์œ ์˜
      • posterior $q(Z X)$: Z ์˜ˆ์ธก๋ชจ๋ธ (๋ฌธ์ œ $X$๋ฅผ ๋ณด๋ฉด ๋ฌด์Šจ ์ƒ๊ฐ$Z$๊ฐ€ ๋– ์˜ฌ๋ผ?) >ย ๋ฌธ์ œ๋Š”, ์–˜๋งŒ ํ•™์Šตํ•˜๋ฉด Z๊ฐ€ ์ง„์งœ reasoningํ•˜๋Š” Z๊ฐ€ ์•„๋‹ˆ๋ผ, ๋ณด๊ณ ์žˆ๋Š” ๋ฌธ์ œ์—๋งŒ ๋งž๋Š” Y๋ฅผ ๋งž์ถ”๋Š”๋ฐ์— ํžŒํŠธ๋กœ์จ๋งŒ ์—ญํ• ํ•˜๋Š” hacky reasoning์ด ๋  ์šฐ๋ ค=overfitting)
      • joint $p(Z,X)$: $Z$๋ž‘ $X$๋ฅผ ๊ฐ™์ด ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ (์ •๋‹ต ์ƒ๊ฐ $Z^*$๊ฐ€ ์žˆ๋‹ค๋ฉด X๊ฐ™์€ ๋ฌธ์ œ์˜€๊ฒ ๋„ค ์˜ ํ๋ฆ„) >ย Z๊ฐ€ ์ •๋ง ๊ดœ์ฐฎ์€ ์ƒ๊ฐ์ด๋ผ๋ฉด Z๋งŒ ๋ด๋„ X๊ฐ€ ๋ญ”์ง€ ์•Œ ์ˆ˜ ์žˆ์ง€ ์•Š๋‚˜? ๊ฐ™์€ ์ œ์•ฝ์„ Z์— ์ฃผ๋ฉด์„œ ๋‹จ์ˆœ Y๋งŒ ๋ณด๊ณ  ๋งŒ๋“ค์ง€ ์•Š๋„๋ก ์˜๋„.

Effects

  • Experiments setup
    • Benchmarks: MATH, GSM8K
    • backbone: TinyLlama-1.1B
    • EM iteration์€ 4ํšŒ ๋ฐ˜๋ณต
    • baseline
      • Raw Token Match: latent thought์—†์ด raw-corpus๋งŒ์œผ๋กœ BoLT๋งŒํผ ๋™์ผํ•œ ํ† ํฐ์ˆ˜ ํ•™์Šต
      • Train FLOP Match: ๋” ์ž‘์€ corpus๋ฅผ ์—ฌ๋Ÿฌ๋ฒˆ ๋Œ๋ ค์„œ FLOP ์—ฐ์‚ฐ๋Ÿ‰๋งŒ BoLT๋งŒํผ ๋งž์ถฐ ํ•™์Šต
      • Latent Warmstart: M_0(gpt-4o-mini)๊ฐ€ ์ƒ์„ฑํ•œ synthetic latent๋กœ Z๋ฅผ ์จ์„œ ์‹œ์ž‘ํ•œ ๋ชจ๋ธ (bootstrapping ์—†์Œ)
  • results:
    • latent thought๊ฐœ๋…์„ ์ ์šฉํ•œ BoLT๊ฐ€ ๊ทธ๋ ‡์ง€ ์•Š์€ baseline๋“ค์— ๋Œ€ํ•ด SOTAย tab 1
    • ELBO objective๋Š” iteration ๋Œ ๋•Œ๋งˆ๋‹ค ๊ฐœ์„  > ์‹ค์ œ downstream task ์„ฑ๋Šฅํ–ฅ์ƒย Fig 8
      • ELBO loss ๋–จ์–ด์ง€๋ฉด์„œ MATH ์„ฑ๋Šฅ ๊ฐœ์„  (warm start๋ถ€ํ„ฐ iter=4๊นŒ์ง€ )
    • fine-tuning์—์„œ๋„ BoLT๊ฐ€ raw-data๋ณด๋‹ค ์œ ์ตย Fig 9
      • MATH๋Š” ๊ณ„์† ํ–ฅ์ƒ๋˜๋Š”๋ฐ์—๋น„ํ•ด GSM8K๋Š” iter=2์ดํ›„ plateau
    • bootstrapping์„ ๋ฐ˜๋ณตํ• ์ˆ˜๋ก ๊ฐœ์„ ๋จ ํ™•์ธย Fig 11

Personal note. ๊ฐ„๋งŒ์— ์ข‹์€ ๋…ผ๋ฌธ ๊ฐ™์•„์š”. Latent ๊ฐœ๋…์„ ์“ฐ๊ธด ํ•˜๋Š”๋ฐ ๊ทธ๋ ‡๋‹ค๊ณ  vector๋กœ ๊ฐ€์ ธ์˜จ๊ฑด ์•„๋‹ˆ๋ผ์„œ ๋ˆˆ์œผ๋กœ ํ•ด์„ํ•ด๋ด„์ง ํ•˜๋‹จ๊ฒƒ๋„ ๊ทธ๋ ‡๊ณ (์ด ์—ฐ๊ตฌ์—์„œ ์ตœ์ดˆ ์ œ์•ˆํ•œ ๋“ฏ), CoT๋Š” ์–ด์จŒ๋“  inference-time์˜ reasoning์ด๊ธด ํ•˜๋‹ˆ๊นŒ training์— ์–ด๋–ป๊ฒŒ reasoning๊นŒ์ง€ ํ•™์Šตํ• ์ง€๋ฅผ ๊ตฌ์ฒด์ ์œผ๋กœ ๋””์ž์ธํ•ด์„œ ์ ์šฉํ•ด๋ณด๊ณ , ๋ชจ๋ธ์˜ ์ƒ์„ฑ ๊ฒฐ๊ณผ reasoning์„ ์‹ค์ œ ํ•™์Šต์— ๋‹ค์‹œ ๋„ฃ๋Š” self-improvement๋ผ๋Š” ์ง€์ ์ด ์ด๋ก ์ ์œผ๋กœ๋„ ์‹คํ—˜์ ์œผ๋กœ๋„ ๊ฒ€์ฆ๋œ ์ , ๋ชจ๋ธ์ด joint๋ž‘ posterior ๊ตฌ์กฐ ํ•™์Šตํ•˜๋ ค๊ณ  special token์„ ํ™œ์šฉํ•œ์ ์ด ๋˜‘๋˜‘ํ–ˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. iteration ๋Œ ๋•Œ๋งˆ๋‹ค ๋น„์‹ผ๊ฑฐ (๊ทธ๋ž˜์„œ 1.1B ๋ชจ๋ธ๋กœ ์‹คํ—˜ํ• ์ˆ˜๋ฐ–์—,,,) EM์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์„ฑ์ƒ Z initialization์ด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€์— ์˜์กด๋„๊ฐ€ ํด๊ฑฐ๋ผ๋Š” ์ ์ด ๊ตฌ์กฐ์  ํ•œ๊ณ„์ธ๊ฑฐ๋ž‘ z๋Š” ์–ด์จŒ๋“  backbone์˜ upperbound์— ์ œ์•ฝ์ด ์žˆ๊ธด ํ•˜๋‹ค๋Š” ์ ์€ ๋‹น์—ฐํžˆ ํ›„์†์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๊ฒ ์ง€๋งŒโ€ฆ

Categories: