3 minute read

Meta info.
  • Authors: Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun
  • Paper: https://www.arxiv.org/pdf/2510.04618
  • Affiliation: SambaNova Systems Inc., Stanford Univ., UC Berkeley
  • Published: October 6, 2025

TL; DR

generation > reflection > curation ๋ชจ๋“ˆ์„ ๊ฑฐ์ณ incremental delta updates๋งŒ ๋ฐ˜์˜ํ•˜๋Š” prompt refinement framework ACE ์ œ์•ˆ

image 1 image 2 image 3 image 4 image 5 image

Background

  • ๋ชจ๋ธ์ด ์ปค์ง€๋ฉด์„œ finetuning์€ ๋น„ํšจ์œจ์ ์ธ ๋ฐ˜๋ฉด context engineering ์ด ๋ณด๋‹ค ์‰ฝ๊ณ  ์ง๊ด€์ ์ด๋ผ๋Š” ์ดํ•ด
    • context adaptation: ๋ชจ๋ธ์„ ๊ทธ๋ƒฅ ๋‘๊ณ ๋„ context๋งŒ ์ž˜ ๊ฐ€๊ฟ”์ฃผ๋ฉด ์„ฑ๋Šฅ ํฌ๊ฒŒ ํ–ฅ์ƒ
      • system prompt: ๋ชจ๋ธ์˜ ์—ญํ• , ํƒœ๋„ ์ •์˜
      • memory: ๊ณผ๊ฑฐ ๊ฒฝํ—˜ ์š”์•ฝ, ๋‹ค์Œ ์ž‘์—…์— ํ™œ์šฉ
      • retrieval augmentation: ์™ธ๋ถ€ ์ง€์‹ ํ™œ์šฉ reasoning
  • ๋ฌธ์ œ๋Š” context engineering ๊ณผ์ •์—์„œ ์ง€๋‚˜์นœ ๊ฐ„๊ฒฐํ™” ๋ฐ context๋ฅผ ์žƒ๊ฒŒ ๋˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ prompt๊ฐ€ ์—…๋ฐ์ดํŠธ ๋จ

Problem States

LLM์˜ context ์ง„ํ™” ๊ณผ์ •์ด ๋ถ•๊ดดํ•˜๊ธฐ ์‰ฝ๊ณ  ์œ ์ง€๊ฐ€ ๋น„ํšจ์œจ์ 

  • ์ง€์‹์„ ์ถ•์†Œํ•˜๊ฑฐ๋‚˜ ๋„๋ฉ”์ธ์˜ ํ’๋ถ€ํ•จ์„ ์žƒ์ง€ ์•Š์œผ๋ฉด์„œ + ์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ context๋ฅผ adapt์‹œํ‚ฌ ์ˆ˜ ์žˆ์„๊นŒ?
  • ๋น„ํšจ์œจ์„ฑ
    • nl feedback์€ (reflexion, textgrad, gepa ๋“ฑ) ์€ ๋‹จ์ˆœ prompt ์š”์•ฝ์— ๊ทธ์น˜์ง€ ์•Š์•˜๋‚˜
    • ์˜คํžˆ๋ ค ํ’๋ถ€ํ•œ ๋„๋ฉ”์ธ ์ „๋žต/์˜ˆ์‹œ ๋“ฑ์˜ Flow๊ฐ€ ํŒŒ๊ดด๋จ
    • pipeline ์„ค๊ณ„ ์ธก๋ฉด์—์„œ๋„ ์ƒˆ prompt๊ฐ€ ์™œ ๊ทธ๋ ‡๊ฒŒ ๋œ๊ฑด์ง€ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋‚ฎ๊ณ  ์ด๋Š” prompt ์ง„ํ™”๋ณด๋‹ค๋Š” ๋žœ๋คํ•˜๊ฒŒ ๋ฐ”๋€๊ฒŒ ์•„๋‹Œ๊ฐ€
  • context ๋ถ•๊ดด; knowledge-richย ยป> summary-only
    • prompt๋ฅผ ์™„์ „ํžˆ ์žฌ์ž‘์„ฑ(๋Œ€์ฒด)ํ•˜๋Š” ๋ฐฉ์‹์ด ์ผ๋ฐ˜์ 
    • iteration์„ ๋ฐ˜๋ณตํ• ์ˆ˜๋ก ์ง€์‹์ด ์‚ฌ๋ผ์ง€๋Š” ํ˜„์ƒ
    • preliminary experiment: (AppWorld) 18,282-token prompt๊ฐ€ 1ํšŒ์˜ LLM rewrite๋กœ 122-token์œผ๋กœ ๊ฐ์†Œ์‹œ ์ •ํ™•๋„๋Š” 66.7% โ†’ 57.1%๋กœ ๊ธ‰๋ฝ
      • ์ด๋Š” context์—†๋Š” baseline ๋ณด๋‹ค ๋‚ฎ์€ ์„ฑ๋Šฅ (63.7%)

Suggestions

ACE

  • ์ˆœ์„œ: generator๊ฐ€ ์–ด๋–ค reasoning trajectory ์ƒ์„ฑ โ†’ reflector๊ฐ€ lesson ๋„์ถœ โ†’ curator๊ฐ€ lesson์„ bullet์œผ๋กœ ํ†ตํ•ฉ == playbook ๊ตฌ์ถ•
  • bullet: context๋ฅผ bullet ๋‹จ์œ„์˜ ๋ชจ๋“ˆํ˜• playbook์œผ๋กœ ์žฌ๊ตฌ์„ฑ. context์˜ ์ตœ์†Œ๋‹จ์œ„
    • { id, text, helpful_count, harmful_count, last_update_epoch, origin }
      • id : bullet ์‹๋ณ„์ž
      • text : ์‹ค์ œ ๋‚ด์šฉ (rule, strategy, concept, โ€ฆ)
      • helpful/harmful_count : ์„ฑ๊ณต/์‹คํŒจ์—์„œ ์ฐธ์กฐ๋œ ํšŸ์ˆ˜
      • last_update_epoch : ์ตœ์ข… ๊ฐฑ์‹  ์‹œ์ 
      • origin : reflector / curator ์„ธ๋Œ€ ์ •๋ณด <- ์–ด๋–ค ๊ฒฝํ—˜์—์„œ ๋ฐฐ์›Ÿ๋Š”์ง€ ์ถ”์  ๊ฐ€๋Šฅ
  • incremental delta updates; ์ƒˆ๋กœ์šด delta bullet์„ ์ถ”๊ฐ€ํ•˜๊ณ , ์ถฉ๋Œ ์‹œ ๋ณ‘ํ•ฉ
    • v_{t+1} = v_t \oplus \delta v_t: ์ด์ „ context ์œ ์ง€ํ•˜๋ฉด์„œ ์ƒˆ๋กœ์šด ๋‚ด์šฉ๋งŒ ์ฆ๋ถ„์œผ๋กœ ๋ฐ˜์˜
    • generator์˜ trace๋ฅผ ๋ณด๊ณ  ์–ด๋–ค bullet์ด ์œ ์ตํ•˜๊ณ  ์–ด๋–ค ๋ถ€๋ถ„์ด ๋ถ€์กฑํ•œ์ง€ ํ™•์ธ (์ž์—ฐ์–ด)
  • grow-and-refine loop (curator ์—ญํ• )
    • grow: ์ƒˆ bullet ์ถ”๊ฐ€ <- embedding similarity ํ™œ์šฉ
    • refine: ์ค‘๋ณต bullet ํ†ตํ•ฉ, ์˜ค๋ž˜๋œ/ํšจ์œจ ๋‚ฎ์€ bullet ์ œ๊ฑฐ <- llm call
  • reflector // curator role ๊ตฌ๋ถ„
    • reflector: llm์ด reasoningํ•˜์—ฌ lesson ์ œ์•ˆ
    • curator: lesson ๊ฒ€์ฆ, ๋ณ‘ํ•ฉ (non-llm) <- ๊ตฌ์กฐ์  ์•ˆ์ •์„ฑ ๋ณด์žฅ
      1. lesson์„ bulletํ›„๋ณด๋กœ parsingํ•˜๊ณ 
      2. \deta v_t ์ƒ์„ฑ: ์ƒˆ๋กœ์šด ๋ถ€๋ถ„๋งŒ ํฌํ•จ
      3. grow-and-refine ์ˆ˜ํ–‰
      4. metadata ์—…๋ฐ์ดํŠธ : helpful_count, harmful_count, ...

Effects

  • baselines
    • backbone: deepseek-v3.1-instruct (no thinking)
      • reAct (memory + system prompt) ๊ตฌ์กฐ ๊ธฐ๋ฐ˜
    • no context, ICL (fs)
    • mipro v2, gepa, dc(dynamic cheatsheet)
    • ACE (์ œ์•ˆ๋ฐฉ๋ฒ•) offline / online
      • offline : playbook ํ™œ์šฉ (pretrained) > ์ƒˆ task์—์„œ ์—…๋ฐ์ดํŠธ ์—†์Œ. (ํ•™์Šต์—๋งŒ ์ ์šฉ. prompt pretraining)
      • online : ์‹คํ–‰์ค‘ reflector + curator๊ฐ€ ๊ณ„์† update (inference-time continual learning)
  • ๊ณตํ†ต์ ์œผ๋กœ multi-turn ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” benchmark ์‚ฌ์šฉ(AppWorld, FiNER, Formula ๋“ฑ)
  • Fig1 Tab1 AppWorld: ๊ฐ€์ƒ์˜ tool ํ™œ์šฉ ํ™˜๊ฒฝ์—์„œ์˜ reasoning benchmark, tool call + observation reasoning + feedback loop
    • ํ‰๊ท  10pp ์ด์ƒ ์„ฑ๋Šฅ ํ–ฅ์ƒ, ํŠนํžˆ test challenge split์—์„œ gpt-4.1 ๊ธฐ๋ฐ˜์˜ IBM CUGA๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ
    • online ACE ๋Š” label ์—†์ด execution feedback ์œผ๋กœ๋งŒ ํ•™์Šต ๊ฐ€๋Šฅ
    • context ๊ธธ์ด๊ฐ€ ์œ ์ง€๋˜์ง€๋งŒ collapse๋Š” ์—†๋‹ค๋Š” ์˜๋ฏธ
  • Tab2 FiNER (financial reasoning task) , Formula(numerical reasoning) ๋“ฑ reasoning task์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ
  • Tab4 curator๊ฐ€ non-llm์ด๋ผ ์†๋„ ์ตœ๋Œ€ 10๋ฐฐ ๊ฐ€๊นŒ์ด ๊ฐœ์„ 
  • Tab3 ๋ชจ๋“  component๊ฐ€ ์‹ค์ œ ๊ธฐ์—ฌํ•œ๋‹ค๊ณ  ๋ด„
  • ์ •์„ฑํ‰๊ฐ€ ๊ฒฐ๊ณผ
    • reflector๊ฐ€ ์ƒ์„ฑํ•˜๋Š” lesson์ด ๋Š˜์–ด๋‚ ์ˆ˜๋ก context ๊ธธ์ด๋Š” ์‹ค์ œ ์ฆ๊ฐ€ํ–ˆ๊ณ 
    • collapse๊ฐ€ ์—†๋‹ค๊ณ  ์ฃผ์žฅ (๊ธธ์ด ์œ ์ง€)
    • ์‚ฌ๋žŒ์ด ๋ดค์„ ๋•Œ playbook์ด ๊ฐˆ์ˆ˜๋ก domain specificํ•˜๋‹ค๊ณ  ๊ด€์ฐฐ

Personal note. ์—ฐํœด ๊ฐ„ ์ฃผ๋ชฉ๋ฐ›์€ ํŽ˜์ดํผ์ค‘ ํ•˜๋‚˜์˜€๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š”๋ฐ, ์ œ์•ˆํ•œ delta incremental update๊ฐ€ ๊ฒฐ๊ตญ self-critic์œผ๋กœ ์—…๋ฐ์ดํŠธ ํ• ๋งŒํ•˜๊ณ  (=ํ•„์š”ํ•˜๊ณ ) ์œ ์ตํ•˜๋‹ค๊ณ  ํŒ๋‹จํ•œ ๊ฒƒ๋งŒ ๋ชจ์€ memory๋ผ๋Š” ์ธ์ƒ์ด์—ˆ์Šต๋‹ˆ๋‹ค, ๋ฌผ๋ก  reasoning task์ด๊ธด ํ•˜์ง€๋งŒ turn์ด ์ง€๋‚ ์ˆ˜๋ก memory๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ prompt ๋‚ด์—์„œ ๊ด€๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก  ์ค‘ ํ•˜๋‚˜๋กœ ์ดํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น memory๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ์‹์€ ๋‹จ์ˆœํ•˜์ง€๋งŒ,, Ground Truth ์—†์ด๋„ reflection์€ ๊ฐ€๋Šฅํ•˜๋‹ˆ, online์—์„œ ํ˜ผ์ž ์—ด์‹ฌํžˆ ์ถ”๋ก ํ•˜๊ณ  ๊ฒฐ๊ณผ ์—…๋ฐ์ดํŠธ ์ณค์„๋•Œ ์„ฑ๋Šฅ์ด ์œ ์ตํ•˜๊ฒŒ ๊ฐœ์„ ๋œ๋‹ค๋Š” ๋ฐฉํ–ฅ์—์„œ ์˜๋ฏธ๊ฐ€ ์žˆ์–ด๋ณด์ž…๋‹ˆ๋‹ค. ์• ์ดˆ์— ์ •๋‹ต์ด ์—†๋Š” dialogue ์ƒํ™ฉ์—์„œ memory ๊ด€๋ฆฌ ์ธก๋ฉด์œผ๋กœ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ์„์ง€ ๊ณ ๋ฏผํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.