2 minute read

Meta info.

TL; DR

multi-turn interaction์—์„œ user์˜ explicit preference๋ฅผ memory๋กœ ํ•™์Šตํ•˜๋ฉด ๋‹จ์ˆœ Recall-based memory๋ณด๋‹ค long-term collaboration(์„ฑ๊ณต๋ฅ /ํšจ์œจ/user burden)์ด ์œ ์˜ํ•˜๊ฒŒ ๊ฐœ์„ ๋œ๋‹ค.

Paper of the day slide

MultiSessionCollab figure 1

MultiSessionCollab figure 2

MultiSessionCollab figure 3 MultiSessionCollab figure 4

Background

  • long context != long-term memory + memory != recall
    • ๊ธฐ์กด) ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ๊ธฐ์–ตํ•˜๋Š”์ง€, ๊ณผ๊ฑฐ ์„ค์ •์— ์ผ๊ด€์„ฑ์ด ์žˆ๋Š”์ง€๋งŒ ํ™•์ธ
  • ๋‹จ์ผ Session์—์„œ preference elicitation๋งŒ์œผ๋กœ๋Š” long-term collab.์— ๋ถ€์กฑ

Problem States

memory๊ฐ€ ์‹ค์ œ collab.์— ๋„์›€์ด ๋˜์—ˆ๋Š”๊ฐ€?๋ฅผ ํ™•์ธํ•˜์ž.

  • ์ „์ œ) ์ข‹์€ memory = ์ •๋ณด๋ฅผ ๋งž์ถ”๋Š” ๊ฒƒ = ์‚ฌ์šฉ์ž ์ธ์ง€ ๋ถ€๋‹ด์„ ์ค„์ด๊ณ  collab. ํšจ์œจ์„ ๋†’์ด๋Š” ๊ฒƒ

Suggestions

  • MultiSession Collab
    • task: MATH-500/MATH-Hard, LogiQA, MMLU, MedQA, ..
    • ์‚ฌ์šฉ์ž๋Š” multi-session์— ๊ฑธ์ณ ํ•˜๋‚˜์˜ ๋ฌธ์ œ๋ฅผ ํ’€๊ณ , ๊ทธ์— ๋Œ€ํ•œ draft answer๋ฅผ ํ˜ผ์ž ๊ด€๋ฆฌ (agent๋Š” ๋ชป ๋ด„)
      • agent ์‘๋‹ต์ด ์œ ์ตํ•˜๊ณ  ์„ ํ˜ธ์— ๋ถ€ํ•ฉํ•  ๋•Œ๋งŒ draft๊ฐ€ ์—…๋ฐ์ดํŠธ๋  ๊ฒƒ
      • ==์„ ํ˜ธ์— ๋ถ€ํ•ฉํ•˜์ง€ ์•Š์œผ๋ฉด, ์•„๋ฌด๋ฆฌ ์ •๋‹ต์ด๋”๋ผ๋„ ๋ฐ˜์˜๋˜์ง€ ์•Š์„ ๊ฒƒ
    • Persona Hub์—์„œ ์˜จ persona์— ๋Œ€ํ•ด ๋”ฑ 3๊ฐœ์˜ interaction preference ๊ตฌ์ถ•
      • ์ด ๋•Œ preference๋Š” ์‹ฌ๋ฆฌํ•™ ๋ฐ HCI ๊ธฐ๋ฐ˜ taxonomy๋ฅผ ๋”ฐ๋ฆ„ (Appendix A)
      • e.g. ๋ถˆํ•„์š”ํ•œ ์„œ๋‘ ์ง€์–‘; high-level ์„ค๋ช… ์„ ์ œ์‹œ ์„ ํ˜ธ; step-by-step ์‘๋‹ต ์„ ํ˜ธ or ํ•œ๋ฒˆ์— ์‘๋‹ต ์„ ํ˜ธ; proactive suggestion์— ๋Œ€ํ•œ ์ง€์–‘; TLDR/bullet/confidence ์š”๊ตฌ ๋“ฑ
      • ์ด์‚ฐ์ ์œผ๋กœ ์„ค๊ณ„๋œ preference๋ผ๋Š” ์ ์€ ํ•œ๊ณ„๋กœ ์ง€์ ๋  ๋“ฏ
    • single session simulation
      • user: ๋ฌธ์ œ๋ฅผ ๋ถˆ์™„์ „ํ•˜๊ฒŒ ์„ค๋ช…ํ•˜๊ณ  ์ผ๋ถ€๋Ÿฌ ์ •๋ณด๋ฅผ ๋œ ์คŒ (clarifying question ์œ ๋„)
      • agent: ์งˆ๋ฌธํ•˜๊ฑฐ๋‚˜ ์„ค๋ช…ํ•˜๊ธฐ๋ฅผ ์‹œ๋„
      • ๋‹ค์‹œ user: (๋‚ด๋ถ€์ ์œผ๋กœ ํŒ๋‹จํ•˜๊ธธ) ๋ฐฉ๊ธˆ agent์˜ ์‘๋‹ต์ด Preference๋ฅผ ๋งŒ์กฑํ–ˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จ
        • ๋งŒ์กฑํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ preference๋ฅผ enforceํ•˜๋Š” ๋ฐœํ™”๋ฅผ ์ฃผ๊ณ  == ํ•™์Šต ์‹ ํ˜ธ๋กœ ๊ฐ„์ฃผ
        • ๋งŒ์กฑํ•œ ๊ฒฝ์šฐ draft answer ์—…๋ฐ์ดํŠธ
      • terminate ์กฐ๊ฑด ํ™•์ธ: ์ตœ๋Œ€ 10-turn, ํ˜น์€ user ๋งŒ์กฑ์‹œ ์ข…๋ฃŒ
  • Memory: Session ๋‹จ์œ„ Reflection:
    • ๋งค session ํ›„ ์–ด๋–ค ์„ ํ˜ธ๊ฐ€ ๋“œ๋Ÿฌ๋‚ฌ๊ณ , ์–ด๋–ป๊ฒŒ ๋งŒ์กฑ์‹œ์ผœ์•ผํ•˜๋Š”์ง€๋ฅผ ์š”์•ฝ -> memory update
    • ๋‹ค์Œ session ์‹œ์ž‘์‹œ ์ „์ฒด ๋ฉ”๋ชจ๋ฆฌ ์ œ๊ณต, ๋งค turn๋งˆ๋‹ค ํ˜„์žฌ ๋Œ€ํ™”์— relevantํ•œ memory๋งŒ retrieval
  • RL: response๋ฅผ ์ง์ ‘ ํ•™์Šตํ•˜์ง€ ์•Š๊ณ , reflection์ด ์–ผ๋งˆ๋‚˜ preference๋ฅผ ์ž˜ ํฌ์ฐฉํ–ˆ๋Š”์ง€ reward
    • reward: coverage + format
      • user-enforced ์„ ํ˜ธ๊ฐ€ ์ œ๋Œ€๋กœ ํฌ์ฐฉ๋๋Š”์ง€ + ๊ตฌ์กฐ์ ์œผ๋กœ ์ž˜ ์ •๋ฆฌ๋๋Š”์ง€
      • response ๊ฐœ์„ ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ†ตํ•ด ๊ฐ„์ ‘์ ์œผ๋กœ ๋ฐœ์ƒํ•œ๋‹ค๊ณ  ๊ฐ„์ฃผ

Effects

  • metrics: ์‚ฌ์šฉ์ž ์ธ์ง€ ๋ถ€ํ•˜์— ๋Œ€ํ•œ ์ธก์ • ์ง€ํ‘œ ์ •์˜
    • Task Success: final draft์˜ ์ •ํ™•๋„
    • User Effort: ์‚ฌ์šฉ์ž๊ฐ€ ์„ ํ˜ธ๋ฅผ ๊ฐ•์ œ ์กฐ์ •ํ•œ ํšŸ์ˆ˜
    • Conversation Length: ๋Œ€ํ™” ๊ธธ์ด
  • Results: memory ์ถ”๊ฐ€์‹œ user burden ๊ฐ์†Œ + GRPO๋กœ reflection ํ›ˆ๋ จ์‹œ task success ๊ฐœ์„  ๊ฐ€๋Šฅ
    • memory ๊ธฐ๋ฐ˜ agent๊ฐ€ oracle preference๋ฅผ ์ œ๊ณตํ•œ agent์™€ ๊ฑฐ์˜ ๋™์ผํ•œ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ
      • oracle์€ ์„ ํ˜ธ์— ๋Œ€ํ•œ ์„ค๋ช…๋งŒ ์ฃผ์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ๋Š” context / application ๋งฅ๋ฝ๊นŒ์ง€ ๋ˆ„์ ํ•ด์„œ ๋” ์œ ์ตํ–ˆ๋‹ค๊ณ  ๋ถ„์„
    • ์ดˆ๋ฐ˜ 1~5 session๊นŒ์ง€ ๊ฐœ์„ ์ด ๋‘๋“œ๋Ÿฌ์ง€๊ณ , ์ดํ›„ ์•ˆ์ •ํ™”
      • user effort ๊ฐœ์„ ์ด task success๋ณด๋‹ค ํผ = ๋Œ€ํ™”๊ฐ€ ํŽธํ•ด์ง„ ํ›„์— ์„ฑ๋Šฅ์ด ๊ฐœ์„ ๋จ
    • human interaction ์‹คํ—˜:
      • 19 ๋ช…์—๊ฒŒ 3๊ฐœ session์— ๋Œ€ํ•ด conding-only ํ˜น์€ mixed-domain ์‹คํ—˜, ์ฃผ๊ด€์„ฑ ํ‰๊ฐ€ (preference adherence, memory, confidence, satisfaction ,,,)
      • ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์žˆ์„๋–„ ๋” ์šฐ์ˆ˜ํ•˜์ง€๋งŒ, mixed-domain generalization ํšจ๊ณผ๋Š” ๋ฏธ๋ฏธํ–ˆ๊ณ 
      • ์‚ฌ์šฉ์ž๊ฐ€ ์„ ํ˜ธ๋ฅผ ๋งํ•ด์ฃผ๋ฉด ๊ทธ๊ฒŒ ๋” ํšจ๊ณผ์ 

Personal note. ์—ฌ๋Ÿฌ๋ชจ๋กœ ์ €ํฌ๊ฐ€ ์ด๋ฒˆ์— ๋‚ธ ํŽ˜์ดํผ์™€ ๋งŽ์€ ์ง€์ ์„ ๊ณต์œ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ์— ๋ญ˜ ๋‹ด์„๊ฑฐ๊ณ , ๊ทธ๊ฒŒ ์–ด๋–ป๊ฒŒ ํƒ€๋‹นํ•œ์ง€๋ฅผ ๋ฐํžˆ๋Š”๊ฒŒ ์ด ์—ฐ๊ตฌ์™€ ์ €ํฌ ์—ฐ๊ตฌ ๋ชจ๋‘์˜ ๊ณ ๋ฏผ์ธ ๊ฒƒ ๊ฐ™๊ณ , ๋‹ค๋งŒ ์ด ์—ฐ๊ตฌ์—์„œ๋Š” ์œ ์ € feedback์„ RL ํ•™์Šต signal๋กœ ์‚ฌ์šฉํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์†Œ generalizability์— ํ•œ๊ณ„๋ฅผ ๋Œ ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. (์‹ค์ œ mixed-domain ์„ฑ๋Šฅ ํ•˜๋ฝ ์—ญ์‹œ ๊ฐ™์€ ๋งฅ๋ฝ์œผ๋กœ ๋ณด์ž„) ์ฃผ์š” ๊ฐ€์ •์ด๋‚˜ ์ „์ œ ์ž์ฒด๊ฐ€ ์ €ํฌ ์—ฐ๊ตฌ๋ž‘ ๋งค์šฐ ์œ ์‚ฌํ•˜์ง€๋งŒ ํ•ด๊ฒฐํ•˜๊ณ ์žํ•œ ๋ฐฉํ–ฅ์€ ์‚ด์ง ๋‹ฌ๋ผ์„œ ์ข€ ๋” ๋ฉด๋ฐ€ํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. (ํ…Œ์ด๋ธ”์— ๊ฒฐ๊ณผ ๋ฆฌํฌํŠธ ๋ฐฉ์‹์ด ์ด๋ฒˆ์— ๊ณ ๋ฏผํ–ˆ๋˜ ๋ฐฉ์‹์œผ๋กœ ํ‘œ๊ธฐ๋œ๊ฒŒ ์žฌ๋ฐŒ๋„ค์š”)