1 minute read

Meta info.

TL; DR

LLM-based agent์— reasoning, conversation, action ๊ธฐ๋Šฅ์„ ํ†ตํ•ฉ, ๋Œ€ํ™”ํ˜• ํ™˜๊ฒฝ์—์„œ ์—ญ๋™์ /ํ˜‘์—…์ /context-awareํ•œ task-solving์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ReSpAct ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์•ˆ

image 1 image 2 image 3 image 4 image

Background

  • ReAct: reasoning+action โ†’dynamic user interaction ๊ณ ๋ ค ๋ชปํ•จ
  • zs ์ค‘์‹ฌ inference ์—ฐ๊ตฌ: ์˜ค๋ฅ˜์ „ํŒŒ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋ชปํ•จ
  • ์ „๋ฐ˜์ ์œผ๋กœ ๋Œ€ํ™”ํ˜• ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ฐ•๊ฑดํ•œ ์„ฑ๋Šฅ ๊ตฌํ˜„์˜ ๋ฌธ์ œ

Problem States

  • LLM-based agent๊ฐ€ ์‚ฌ์šฉ์ž ์š”๊ตฌ์— ๋งž๊ฒŒ action์„ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€
  • ๋Œ€ํ™” ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒ๋˜๋Š”
    • ํ”ผ๋“œ๋ฐฑ/์„ค๋ช… interaction์„ ์ œ๋Œ€๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€
    • ์˜ˆ์™ธ์ฒ˜๋ฆฌ, ๊ฐ€๋ณ€์ ์ธ context์ฒ˜๋ฆฌ๋ฅผ ์œ ์—ฐํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€

Suggestion

๋Œ€ํ™”ํ˜• ์ƒํ˜ธ์ž‘์šฉ์„ ์˜์‚ฌ๊ฒฐ์ • ๋ฃจํ”„์— ์ง์ ‘ ํฌํ•จ > ์ ์‘๋ ฅ๊ณผ ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ (Reason + Speak + Act)

  • A_hat(agent์˜ space) = A(action) โˆช L(reasoning path) โˆช U (utterance): agent๋Š” ๋งค turn๋งˆ๋‹ค ํ–‰๋™ํ• ์ง€, ์ƒ๊ฐํ• ์ง€, ๋งํ• ์ง€ ์ •ํ•˜๋Š” ๋ฌธ์ œ
    • internal reasoning (์›๋ž˜ React์—์„œ ํ•˜๋˜ Thinking trace)์ด์ƒ์œผ๋กœ ๋Œ€ํ™” action(์„ค๋ช…, ์—…๋ฐ์ดํŠธ, user ์ž…๋ ฅ ์š”์ฒญ) ํฌํ•จ
    • utterance: agent๊ฐ€ ์ง์ ‘ user๋ž‘ ์†Œํ†ตํ•˜๋Š” ๋ฐœํ™”๋ฅผ ํ•œ ๋ถ„๋ฅ˜๋กœ ์ฒ˜๋ฆฌ
    • pre-defined dialogue schema ์—†์ด task solving ๊ตฌํ˜„: pretrained LLM (e.g., GPT-4o)์— fs prompting๋งŒ์œผ๋กœ ๋™์ž‘. ๋ณ„๋„ tuning ์—†์ด ํ”„๋กฌํ”„ํŠธ์— ์—ญํ•  ๋ถ€์—ฌ
      • โ€œYou are a helpful assistant. You can reason (THINK), act (ACT), or speak to the user (SPEAK)โ€ฆโ€
      • fs: action(reason, speak, act) ์œ ํ˜•์ด ํฌํ•จ๋œ ์˜ˆ์‹œ
์˜ˆ (Alfworld): ์ปต์„ ์„ ๋ฐ˜์— ์˜ฌ๋ ค๋ผ

1. *~~(Agent) observation: o_t (๊ฐ์ฒด ๋ชฉ๋ก/์œ„์น˜)~~*
2. (Agent) *THINK*: โ€œ์ปต์€ ์–ด๋”” ์žˆ์ง€? ์„ ๋ฐ˜์— ์˜ฌ๋ฆฌ๋ ค๋ฉด ์ผ๋‹จ ์ฐพ์žโ€ โ†’ think.
3. (Agent) *SPEAK*: โ€œ์ปต์€ ์–ด๋”” ์žˆ๋‚˜์š”?โ€ โ†’ speak(U)๋กœ ๋ฌผ์–ด๋ด„
4. (User) response(feedback): โ€œ์„ ๋ฐ˜ 2์— ์žˆ์–ดโ€
5. (Agent) *ACT*: act("take cup from shelf2"), act("put cup on shelf1")....
6. *~~(Agent) update: ์ปจํ…์ŠคํŠธ ์—…๋ฐ์ดํŠธ ํ›„ ์ƒˆ๋กœ์šด ์•ก์…˜ ๊ฒฐ์ •~~*
  • ์ฆ‰, ฯ€(policy): C (ํ˜„์žฌcontext ๊ธฐ๋ฐ˜) โ†’ A โˆช L โˆช U
    • C: Observation, Action history, Response (user feedback)

Effects

  • Alfworld: 87.3%์˜ ์ตœ๊ณ  ์„ฑ๊ณต๋ฅ  ๋‹ฌ์„ฑ (vs. ReAct 80.6%, invalid actions์ด 10%p ๋” ์ ์Œ)
  • MultiWOZ: Inform / Success ์—์„œ gpt-4o-mini ๋Œ€๋น„ 5.5%p / 3%p ๊ฐœ์„ 
  • WebShop: ์„ฑ๊ณต๋ฅ  12% (vs. ReAct 8%), user feedback ํฌํ•จ ํ™˜๊ฒฝ์œผ๋กœ ํ™•์žฅํ•˜๋ฉด 50%๋กœ ์ฆ๊ฐ€, (Avg Score ๊ธฐ์ค€์œผ๋กœ๋Š” 20.1> 85.8)

Personal note. iwsds2025 ์‚ดํŽด๋ณด๋ฉด์„œ ํ™•์ธํ–ˆ๋Š”๋ฐ, ํ•ด๋‹น ์›Œํฌ์ƒต์—์„œ ์ฝ์–ด๋ณผ๋งŒํ•œ ๊ฑด์ด๋ผ์„œ ๊ณต์œ ๋“œ๋ ค๋ด…๋‹ˆ๋‹ค. proactiveํ•˜๋‹ค๋Š” ๋А๋‚Œ์—์„œ๋Š” ํ˜„์žฌ ์—ฐ๊ตฌ์™€ ๊ฒฐ์ด ๋น„์Šทํ•˜๊ธฐ๋„ ํ•˜๊ณ , ํ•œํŽธ์œผ๋กœ๋Š” agentic action์— ๋Œ€ํ•ด ๋” ํŠธ๋ Œ๋””ํ•œ ์—ฐ๊ตฌ๋ผ๋Š” ์ƒ๊ฐ๋„ ๋“ค๊ณ ์š”. react๊ฐ€ ๋ชจ๋ธ ํ˜ผ์ž ๋‚ด์ ์œผ๋กœ ์ƒ๊ฐ๋งŒ ํ•˜๊ณ  user feedback์„ ๋ฐ˜์˜ํ•  ์ƒ๊ฐ์„ ๋ชปํ–ˆ๋‹ค๋Š”๊ฑธ ํ•œ๊ณ„๋กœ ์ง€์ ํ•˜๋ฉด์„œ(human-in-the-loop), ๊ณต์œ ๋“œ๋ฆฌ๋Š” ์ด ์—ฐ๊ตฌ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹ค์ œ ์ฑ„ํŒ…ํ™˜๊ฒฝ์— ๋Œ€ํ•œ agent์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณธ๊ฒฉ์ ์œผ๋กœ ๊ณ ๋ฏผํ•ด๋ณด๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋Š˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ planning์ด๋ผ๊ณ  ํ‘œํ˜„๋˜๊ณ  ์žˆ๋Š” ๋ถ€๋ถ„๋“ค์ด ๊ต‰์žฅํžˆ ์ถ”์ƒ์ ์œผ๋กœ, ๋“ค์—ฌ๋‹ค๋ณด๋ฉด ์‹ค์งˆ์ ์œผ๋กœ๋Š” ์ด์ „ state ํ˜น์€ history์ •๋ณด๋ฅผ ๋ชจ๋ธ์ด prompt ๋ ˆ๋ฒจ๋กœ ๋“ค๊ณ ์žˆ๋‹ค๋Š” ์ •๋„์— ๊ทธ์น˜๋Š” ๊ฒƒ์ด ๊ณตํ†ต์ ์ธ ํ•œ๊ณ„๋กœ ๋А๊ปด์ง‘๋‹ˆ๋‹ค.