2 minute read

Meta info.
  • Authors: Liu, Zifeng Wang, Jin Miao, I-Hung Hsu, Jun Yan, Jiefeng Chen, Rujun Han, Fangyuan Xu, Yanfei Chen, Ke Jiang, Samira Daruki, Yi Liang, William Yang Wang, Tomas Pfister, Chen-Yu Lee
  • Paper: https://arxiv.org/pdf/2511.17006
  • Affiliation: Google Cloud AI Research, Google DeepMind, NYU, UC Santa Barbara
  • Published: November 21, 2025

TL; DR

ํˆด ํ˜ธ์ถœ ์˜ˆ์‚ฐ์„ ๋‹จ์ˆœํžˆ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ์—์ด์ „ํŠธ ์„ฑ๋Šฅ์ด ์Šค์ผ€์ผ(TTS)๋˜์ง€ ์•Š์œผ๋ฉฐ, ์˜ˆ์‚ฐ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ธ์‹ํ•˜๋„๋ก ํ•˜๋Š” Budget Tracker์™€ BATS ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋„์ž…ํ•˜๋ฉด ๋น„์šฉ ๋Œ€๋น„ ์„ฑ๋Šฅ ์Šค์ผ€์ผ๋ง๊ณผ Pareto frontier๊ฐ€ ํฌ๊ฒŒ ๊ฐœ์„ ๋œ๋‹ค.

image 1 image 2 image 3 image 4 image 5 image 6 image 7 image 8 image 9 image

Background

  • TTS(Test-time Scaling)ํ™•์žฅ: ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ ์ฆ๊ฐ€ ์—†์ด ์ถ”๋ก ์‹œ ๊ณ„์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€๋กœ ์„ฑ๋Šฅ ํ–ฅ์ƒ
    • sequential refinement, parallel sampling, aggregation ๋“ฑ TTS์˜ย ํ† ํฐ ์†Œ๋น„๋ฅผ ์Šค์ผ€์ผ๋ง ์ถ•์œผ๋กœ ์ถ”๋ก  ๋น„์šฉ ํ™•์žฅ
  • Tool-augmented Agent ํ™•๋Œ€: ์›น๊ฒ€์ƒ‰, ๋ธŒ๋ผ์šฐ์ง• ๋“ฑ
    • tool call ์ธก๋ฉด์—์„œ๋Š” ์™ธ๋ถ€ ํ–‰๋™์— ๋Œ€ํ•œ ๋น„์šฉ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ† ํฐ ์†Œ๋น„๋กœ ์Šค์ผ€์ผ๋ง๋˜์ง€ ์•Š์Œ

Problem States

  • tool call ๋น„์šฉ ์˜ˆ์‚ฐ์„ ํ™•์žฅํ•˜๋ฉด ์ •๋ง ์„ฑ๋Šฅ์ด ์˜ค๋ฅด์ง€ ์•Š๋Š”๊ฐ€
    • ์‹ค์ œ๋กœ tool call ๋น„์šฉ ์˜ˆ์‚ฐ์„ ํ™•์žฅํ•ด๋„ ์„ฑ๋Šฅ์€ ์˜ค๋ฅด์ง€ ์•Š๊ฑฐ๋‚˜ ๋น ๋ฅธ ํฌํ™”
      • ์ถฉ๋ถ„ํ•œ ์˜ˆ์‚ฐ์ด ์žˆ์–ด๋„ ์กฐ๊ธฐ์ข…๋ฃŒํ•˜๊ฑฐ๋‚˜ ํƒ์ƒ‰๊ณผ ๊ฒ€์ฆ์‚ฌ์ด ๊ท ํ˜•์„ ์žก์ง€ ๋ชปํ•จ
  • ์—์ด์ „ํŠธ์—๊ฒŒ ๋‚จ์€ ์˜ˆ์‚ฐ์„ ์ดํ•ด์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‚˜?

Suggestions

  • Search Agent๋กœ TTS ๋ฒ”์œ„ ๊ตฌ์ฒดํ™”
    • Search Agent: ์ž…๋ ฅ์— ๋Œ€ํ•ด Thought > external knowledge๋ฅผ ๊ฒ€์ƒ‰/๋ธŒ๋ผ์šฐ์ง•ํ•˜์—ฌ ์ˆ˜์ง‘> ์ถ”๋ก (Thought)>โ€ฆ >๋‹ต (ReAct-style loop)
      • Search: ์ผ๋ฐ˜ ๊ฒ€์ƒ‰์—”์ง„ ์ฟผ๋ฆฌ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ  ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด tiltie, snippet, URL ๋ฐ˜ํ™˜ (๋„“์€ํƒ์ƒ‰)
      • Browse: ํŠน์ • URL ์Šคํฌ๋ž˜ํ•‘ (๊นŠ์€ํƒ์ƒ‰)
  • ์˜ˆ์‚ฐ ์ œ์•ฝ ํ•˜์—์„œ ์—์ด์ „ํŠธ scaling formulation: ์˜ˆ์‚ฐ ์ œ์•ฝ ํ•˜์—์„œ ๊ธฐ๋Œ€ ์ •ํ™•๋„๋ฅผ ์ตœ๋Œ€ํ™”ํ•œ๋‹ค
    • ํˆด t_i๋Š” ํ˜ธ์ถœ ํšŸ์ˆ˜ c_i์— ๋Œ€ํ•ด ์˜ˆ์‚ฐ b_i๋ฅผ ์ดˆ๊ณผํ•  ์ˆ˜ ์—†๋‹ค.
  • ๊ณต์ • ๋น„๊ต๋ฅผ ์œ„ํ•œ unified cost = token cost + tool-call cost
    • ํˆด ํ˜ธ์ถœ ์ฆ๊ฐ€๋Š” ํ† ํฐ ์‚ฌ์šฉ๋Ÿ‰๊ณผ ๋น„๋ก€ํ•˜๋Š” ์ธก๋ฉด์ด ์žˆ์Œ
      • ๋งŽ์ด ๊ฒ€์ƒ‰ํ• ์ˆ˜๋ก ๊ฒฐ๊ณผ๋ฅผ ๋ฐ›์•„์„œ ์ฒ˜๋ฆฌํ•˜๊ณ  ์ฝ๋Š” ๋ฐ์— ํ† ํฐ ์†Œ๋ชจ (๊ฐ•ํ•œ ์ƒ๊ด€์„ฑ)
    • token cost: ์—์ด์ „ํŠธ์˜ ๋‚ด๋ถ€ ์ธ์ง€ ๋…ธ๋ ฅ์œผ๋กœ ํ•ด์„ (internal knowledge ์ฒ˜๋ฆฌ, reasoning, planning ๋“ฑ์— ์‚ฌ์šฉ)
    • tool call cost: search/browse์— ๋“œ๋Š” API ํ˜ธ์ถœ ๋น„์šฉ = ์™ธ๋ถ€ ์„œ๋น„์Šค ๋น„์šฉ
  • methods:
    • Budget Tracker: prompt-level์—์„œ agent-loop์— ์‚ฝ์ž…
      • prompt: tool ๋ณ„ ์ด ์˜ˆ์‚ฐ, ํ˜„์žฌ๊นŒ์ง€ ์‚ฌ์šฉํ•œ ํ˜ธ์ถœ ์ˆ˜, ๋‚จ์€ ํ˜ธ์ถœ ์ˆ˜, ์˜ˆ์‚ฐ ์ƒํƒœ์— ๋”ฐ๋ฅธ ํ–‰๋™ ๊ฐ€์ด๋“œ(์˜ˆ์‚ฐ์ด ๋งŽ์œผ๋ฉด ์ ๊ทน์ ์œผ๋กœ ์ฐพ์•„๋ผ,โ€ฆ์˜ˆ์‚ฐ์ด ์ ๋‹ค๋ฉด ๊ฒ€์ฆ ์ค‘์‹ฌ์œผ๋กœ,,,)
      • react ํ•œ๊ณ„ ๊ฐœ์„ : ์˜ˆ์‚ฐ์„ ๋ช…์‹œ์ ์ธ ์ƒํƒœ๋ณ€์ˆ˜๋กœ ๊ด€๋ฆฌํ•˜๋ฏ€๋กœ ๋‚ด๋ถ€ ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ ๋Œ€์ถฉ ์ถฉ๋ถ„ํ•˜๋‹ค๋Š” ์ธ์‹์— ๊ฐœ์ž… ๊ฐ€๋Šฅ
    • BATS(budget-aware tts)
      • Budget-Aware Planning: ์ด๋ฏธ ์‹œ๋„ํ•œ ๊ฒ€์ƒ‰์„ ๋‹ค์‹œ ํ•˜์ง€ ์•Š๋„๋ก ์„ค๊ณ„
        • ๋ฌธ์ œ์˜ ์ œ์•ฝ์„ (1) Exploration clues (์ฐพ์•„์•ผ ํ•  ์ •๋ณด)์™€ (2) Verification clues (์ •๋‹ต์„ ๊ฒ€์ฆํ•  ์กฐ๊ฑด)์œผ๋กœ ๋ถ„ํ•ด
        • tree ๊ธฐ๋ฐ˜์œผ๋กœ ์ฒดํฌ๋ฆฌ์ŠคํŠธ(plan)์„ ์œ ์ง€ํ•˜์—ฌ ๊ฐ ๋…ธ๋“œ๊ฐ€ ์ƒํƒœ(์™„๋ฃŒํ–ˆ๋Š”์ง€) + tool ํ˜ธ์ถœ ๊ธฐ๋ก ๊ด€๋ จ ๊ธฐ๋ก
      • Budget-Aware Self-Verification & Continue/Pivot
        • ๋‹ต์•ˆ์— ๋Œ€ํ•ด (1) ์ œ์•ฝ์กฐ๊ฑด๋ณ„ํ‰๊ฐ€; ๋งŒ์กฑํ–ˆ๋Š”์ง€, ๊ฒ€์ฆ๋๋Š”์ง€ (2) ์˜ˆ์‚ฐ์ž”์—ฌ๋Ÿ‰์„ ๊ณ ๋ คํ•˜์—ฌ ์ข…๋ฃŒํ• ์ง€ ๋” ํƒ์ƒ‰ํ• ์ง€ ์ „๋žต์„ ๋ฐ”๊ฟ€์ง€ ๋“ฑ์„ ๊ฒฐ์ • (3) ๋งŒ์•ฝ ์ข…๋ฃŒ๋˜์ง€ ๋ชปํ•œ๋‹ค๋ฉด: ๊ธฐ์กด trajectory๋ฅผ ์••์ถ•ํ•˜์—ฌ context length ๊ด€๋ฆฌ

Effects

  • benchmarks: BrowseComp(์˜์–ด ์›น), BrowseComp-ZH(์ค‘๊ตญ์–ด ์›น), HLE-Search (Humanโ€™s Last Exam ์ค‘ ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ ๋ฌธ์ œ)
  • backbone: Gemini-2.5-Pro, Claude-Sonnet-4
  • tool: Google Custom Search API(Search), Jina,ai + CrawlAI(Browse)
  • Results
    • tab3ย ๋™์ผํ•œ ํˆด ์˜ˆ์‚ฐํ•˜์—์„œ ๋ชจ๋“  ๋ชจ๋ธ์—์„œ ReAct ๋Œ€๋น„ ๋Œ€ํญ ์„ฑ๋Šฅ ๊ฐœ์„ 
      • training-free์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ํŠนํžˆ BrowseComp์—์„œ ๊ฒฉ์ฐจ ํผ
    • fig8 Early Stopping ๋ถ„์„
      • React: ์˜ˆ์‚ฐ์ด ์žˆ์–ด๋„ ์„ฑ๋Šฅ์ด ์˜ค๋ฅด์ง€ ์•Š๊ณ , Browse๋ฅผ ๊ฑฐ์˜ ํ•˜์ง€ ์•Š์Œ
      • BATS: ์˜ˆ์‚ฐ์ด ๋Š˜๋ฉด ๋Š” ๋งŒํผ browse๋ฅผ ์ ๊ทน์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ , ์˜ˆ์‚ฐ์ด ๋‚ฎ๋”๋ผ๋„ baseline ์ด์ƒ ๋‹ฌ์„ฑ
    • tab4ย ablation
      • planning์ด ์—†๊ฑฐ๋‚˜ verification์ด ๋น ์ง€๋ฉด ์„ฑ๋Šฅ ํ•˜๋ฝ, ํŠนํžˆ ํ›„์ž์˜ ๊ฒฝ์šฐ ๊ฒ€์ƒ‰ ๊ธฐ๋ฐ˜ ๋ฌธ์ œ์—์„œ ํƒ€๊ฒฉ ํผ

Personal note. ํ”„๋กฌํ”„ํŠธ์™€ react์˜ ์—ฐ์žฅ์ด๋ผ๋Š” ์ ์ด ์—ฐ๊ตฌ์ ์œผ๋กœ ํฅ๋ฏธ๋กญ์ง€๋Š” ์•Š์„ ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ ํ™•์žฅ๊ณผ ํŽธ์˜์„ฑ ์ธก๋ฉด์˜ ์ด์ ์— ๋Œ€ํ•ด์„œ๋Š” ๋ฐ˜๋ฐ•ํ•˜๊ธฐ ์–ด๋ ค์›Œ๋ณด์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์•„๋งˆ ๊ทธ๋ž˜์„œ ์˜ˆ์‚ฐ์€ ์–ด๋–ป๊ฒŒ ์ •ํ•ด์•ผํ•˜๋Š”์ง€ ์ผ ๊ฒƒ ๊ฐ™๊ณ , ์ €์ž๋“ค๋„ appendix ๋งˆ์ง€๋ง‰์— ์–ธ๊ธ‰ํ•˜๊ธด ํ•˜๋Š”๋ฐ ๋ฌธ์ œ์˜ ์‹œ์ž‘์ด ์—์ด์ „ํŠธ๊ฐ€ ์Šค์Šค๋กœ ์ด๋ฏธ ๋ญ˜ ํ–ˆ๊ณ , ์ง€๊ธˆ ์–ด๋–ค ์ƒํƒœ์— ์žˆ๊ณ , ์•ž์œผ๋กœ ๋ญ˜ ๋” ํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ์ง€์†์ ์œผ๋กœ ์œ ์ง€ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋กœ ์‹œ์ž‘ํ–ˆ๋‹ค๋ฉด ๊ด€๋ จํ•ด์„œ ๋ฉ”๋ชจ๋ฆฌ ์ธก๋ฉด์— ๋Œ€ํ•œ ๊ณ ๋ฏผ์œผ๋กœ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์—ฐ๊ฒฐ๋  ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ํ•œํŽธ์œผ๋กœ budget์€ personalization์˜ ๊ฐ€์žฅ ์‰ฝ๊ฒŒ ์ ‘๊ทผํ• ๋งŒํ•œ factor๋ผ๋Š” ์ƒ๊ฐ๋„ ๋“ค๊ณ ์š”.